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(54) Primers for synthesising full-length cDNA and their use 



(57) Primers for synthesizing full-length cDNAs and 
their use are provided. 

5602 cDNA encoding a human protein has been 
isolated and nucleotide sequences of 5'-, and 3' -ends 
of the cDNA have been determined. Furthermore, prim- 



ers for synthesizing the full-length cDNA have been pro- 
vided to clarify the function of the protein encoded by 
the cDNA. The full-length cDNA of the present invention 
containing the translation start site provides information 
useful for analyzing the functions of the protein. 
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Description 

FIELD OF THE INVENTION 

5 [0001] The present invention relates to a polynucleotide encoding a novel protein, a protein encoded by the polynu- 
cleotide, and new uses of these. 

BACKGROUND OF THE INVENTION 

w [0002] Currently, the sequencing projects, the determination and analysis of the genomic DNA of various living or- 
ganisms have been in progress all over the world. The whole genomic sequences of more than 1 0 species of prokary- 
otes, a lower eukaryote, yeast, and a multicellular eukaryote, C. elegans are already determined. As to human genome, 
which is supposed to be composed of three thousand million base pairs, the world wide cooperative projects have 
been under way to analyze it, and the whole structure is predicted to be determined by the years 2002-2003. The aim 

15 of the determination of genomic sequence is to reveal the functions of all genes and their regulation and to understand 
living organisms as a network of interactions between genes, proteins, cells or individuals through deducing the infor- 
mation in a genome, which is a blueprint of the highly complicated living organisms. To understand living organisms 
by utilizing the genomic information from various species is not only important as an acadernie subject, but also socially 
significant from the viewpoint of industrial application. 

20 [0003] However, determination of genomic sequences itself cannot identify the functions of all genes. For example, 
as for yeast, only the function of approximately half of the 6000 genes, which is predicted based on the genomic 
sequence, was able to be deduced. As for human, the number of the genes is predicted to be approximately one 
hundred thousand. Therefore, it is desirable to establish "a high throughput analysis system of the gene functions" 
which allows us to identify rapidly and efficiently the functions of vast amounts of the genes obtained by the genomic 

25 sequencing. 

[0004] Many genes in the eukaryotic genome are split by introns into multiple exons. Thus, it is difficult to predict 
correctly the structure of encoded protein solely based on genomic information. In contrast, cDNA, which is produced 
from mRNA that lacks introns, encodes a protein as a single continuous amino acid sequence and allows us to identify 
the primary structure of the protein easily. In human cDNA research, to date, more than one million ESTs (Expression 

30 Sequence Tags) are publicly available, and the ESTs presumably cover not less than 80% of all human genes. 

[0005] The information of ESTs is utilized for analyzing the structure of human genome, or for predicting the exon- 
regions of genomic sequences or their expression profile. However, many human ESTs have been derived from prox- 
imal regions to the 3'-end of cDNA, and information around the 5'-end of mRNA is extremely little. Among these human 
cDNAs, the number of the corresponding mRNAs whose encoding protein sequences are deduced is approximately 

35 7 0 00, and further, the number of full-length therein is only 5500. Thus, even including cDNA registered as EST, the 
percentage of human cDNA obtained so far is estimated to be 10-15% of all the genes. 

[0006] It is possible to identify the transcription start site of mRNA on the genomic sequence based on the 5'-end 
sequence of a full-length cDNA, and to analyze factors involved in the stability of mRNA that is contained in the cDNA, 
or in its regulation of expression at the translation stage. Also, since a full-length cDNA contains ATG, the translation 

40 start site, in the 5'-region, it can be translated into a protein in a correct frame. Therefore, it is possible to produce a 
large amount of the protein encoded by the cDNA or to analyze biological activity of the expressed protein by utilizing 
an appropriate expression system. Thus, analysis of a full-length cDNA provides valuable information which comple- 
ments the information from genome sequencing. Also, full-length cDNA clones that can be expressed are extremely 
valuable in empirical analysis of gene function and in industrial application. 

45 [0007] Therefore, if a novel human full-length cDN A is isolated, it can be used for developing medicines for diseases 
in which the gene is involved. The protein encoded by the gene can be used as a drug by itself. Thus, it has great 
significance to obtain a full-length cDNA encoding a novel human protein. 

[0008] In particular, human secretory proteins or membrane proteins would be useful by itself as a medicine like 
tissue plasminogen activator (TPA), or as a target of medicines like membrane receptors. In addition, genes for signal 

50 transduction-associated proteins (protein kinases, etc.), glycoprotein-associated proteins, transcription-associated 
proteins, etc. are genes whose relationships to human diseases have been elucidated. Moreover, genes for disease- 
associated proteins form a gene group rich in genes whose relationships to human diseases have been elucidated. 
[0009] Therefore, it has great significance to isolate novel full-length cDNA clones of human, only few of which has 
been isolated. Especially, isolation of a novel cDNA clone encoding a secretory protein or membrane protein is desired 

55 since the protein itself would be useful as a medicine, and also the clones potentially include a gene associated with 
diseases. In addition, genes encoding proteins that are associated with signal transduction, glycoprotein, transcription, 
or diseases are expected to be useful as target molecules for therapy, or as medicines themselves. These genes form 
a gene group predicted to be strongly associated with diseases. Thus, identification of the full-length cDNA clones 
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encoding those proteins has great significance. 
SUMMARY OF THE INVENTION 

5 [0010] An objective of the present invention is to provide a polynucleotide encoding a novel protein, a protein encoded 
by said polynucleotide, and novel usages of these. 

[001 1 ] The inventors have developed a method for efficiently cloning a human full-length cDNA that is predicted by 
the ATGpr etc. to be a full-length cDNA clone, from a full-length-enriched cDNA library that is synthesized by the oligo- 
cappmg method. Then, the inventors determined the nucleotide sequence of the obtained cDNA clones from both 5'- 

10 and 3'- ends. . 

[00121 Furthermore, the inventors analyzed the obtained clones by the BLAST search of the databases, SwissProt 
(http://wwebi.ac.uk/ebLdocsSwissProLdb/swisshome.html), GenBank (http://www.ncbi.nlm.nih.gov/web/Gen- 
Bank) and UniGene (Human) (http://www.ncbi.nlm.nih.gov/UniGene). 

[0013] The full-length cDNA clones of the present invention have high fullness ratio since these were obtained by 
15 the combination of (1) construction of a full-length-enriched cDNA library that is synthesized by the oligo-capping 
method and (2) a system in which the full-length ratio is evaluated from the nucleotide sequence of the 5'-end (selection 
based on the ATGpr, previously removed complete sequences to ESTs). However, the primer of the present invention 
enables to obtain full-length cDNA easily without any specialized methods as in the described method. 
Homology analysis in which the analysis is earned out against a not-full-length cDNA fragment to postulate the function 
20 of a protein encoded by said fragment, is being commonly performed. 

However since such analysis is based on the information of the fragment, it is not clear as to whether this fragment 
corresponds to a part that is functionally important in the protein . In other words, the reliability of the homology analysis 
based on the information of a fragment is doubtful, as information related to the structure of the whole protein is not 
available However, the homology analysis of the present invention is conducted based on the information of a full- 
25 length cDNA comprising the whole coding region of the cDNA, and therefore, the homology of various portions of the 
protein can be analyzed. Hence, the reliability of the homology analysis has been dramatically improved in the present 
invention. 

[0014] The inventors completed the invention by finding that it is possible to synthesize a novel full-length cDNA by 
using the combination of a primer that is designed based on the nucleotide sequence of the S'-ends of the selected 
30 full-length cDN A clones and any of an oligo-dT primer or a 3'-primer that is designed based on the nucleotide sequence 
of the 3'-ends of the selected clones. 

[0015] Thus, the present invention relates to primers described below, a method for synthesizing a polynucleotide 
using the primers, and polynucleotides obtained by the method. 
[0016] First, the present invention relates to 

35 

(1 ) use of an oligonucleotide as a primer for synthesizing the polynucleotide comprising the nucleotide sequence 
set forth in any one of SEQ ID NOs: 1 -5547 and SEQ ID NOs: 1 61 11 -1 61 64, or the complementary strand thereof, 
wherein said oligonucleotide is complementary to said polynucleotide or the complementary strand thereof and 
comprises at least 15 nucleotides; 
40 (2) a primer set for synthesizing polynucleotides, the primer set comprising an oligo-dT primer and an oligonucle- 

otide complementary to the complementary strand of the polynucleotide comprising the nucleotide sequence set 
forth in any one of SEQ ID NOs: 1-5547 and SEQ ID NOs: 16111-16164, wherein said oligonucleotide comprises 
at least 15 nucleotides; and 

(3) a primer set for synthesizing polynucleotides, the primer set comprising a combination of an oligonucleotide 
45 comprising a nucleotide sequence complementary to the complementary strand of the polynucleotide comprising 

a 5'-end nucleotide sequence and an oligonucleotide comprising a nucleotide sequence complementary to the 
polynucleotide comprising a 3'-end nucleotide sequence, wherein said oligonucleotides comprise at least 15 nu- 
cleotides and wherein said combination of 5'-end nucleotide sequence / 3'-end nucleotide sequence is selected 
from the combinations of 5'-end nucleotide sequence / 3'-end nucleotide sequence set forth in the SEQ ID NOs 
so in Tables 1 and 2. 

[001 7] Tables 1 and 2 shows names of clones obtained in the examples described later, comprising the polynucleotide 
of tne present invention (Tabie 1 , 5547 clones, Table 2; 54 clones), names of nudeot.de sequences at the 5'-end and 
3'-end of the full-length cDNA, and their corresponding SEQ iD NOs. A blank indicates that the 3'-end sequence cor- 
55 responding to the 5'-end sequence has not'been determined for the same clone. 

[0018] The SEQ ID NO of a 5'-end sequence is shown on the right side of the name of the 5'-end sequence, and the 
SEQ ID NO of a 3'-end sequence is shown on the right side of the name of the 3'-end sequence. 
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NT2RP2001214 


F-NT2RP2001214 


16136 


R-NT2RP2001214 


16190 




NT2RP2001460 


F-NT2RP2001460 


16137 


R-NT2RP2001460 


16191 




NT2RP2001756 


F-NT2RP2001756 


16138 


R-NT2RP2001756 


16192 


5 


NT2RP2002056 


F-NT2RP2002056 


16139 


R-NT2RP2002056 


16193 




NT2RP2002677 


F-NT2RP2002677 


16140 


R-NT2RP2002677 


16194 




NT2RP2002755 


F-NT2RP2002755 


16141 


R-NT2RP2002755 


16195 


10 


NT2RP2002843 


F-NT2RP2002843 


16142 


R-NT2RP2002843 


16196 




NT2RP2003101 


F-NT2RP2003101 


16143 


R-NT2RP2003101 


16197 




NT2RP2003799 


F-NT2RP2003799 


16144 


R-NT2RP2003799 


16198 




NT2RP2004095 


F-NT2RP2004095 


16145 


R-NT2RP2004095 


16199 


15 


NT2RP2004732 


F-NT2RP2004732 


16146 


R-NT2RP2004732 


16200 




NT2RP2O04920 


F-NT2RP2004920 


16147 


R-NT2RP2004920 


16201 




NT2RP2005454 


F-NT2RP2005454 


16148 


R-NT2RP2005454 


16202 




NT2RP2005776 


F-NT2RP2005776 ( 


16149 


R-NT2RP2005776 


16203 


20 


NT2RP2005806 


F-NT2RP2O058O6 


16150 


R-NT2RP2005806 


16204 




NT2RP2005882 


F-NT2RP2005882 


16151 


R-NT2RP2005882 


16205 




NT2RP3001282 


F-NT2RP3001282 


16152 


R-NT2RP3001282 


16206 


25 


NT2RP3001723 


F-NT2RP3001723 


16153 


R-NT2RP3001723 


16207 


NT2RP3002099 


F-NT2RP3002099 


16154 


R-NT2RP3002099 


16208 




NT2RP3003155 


F-NT2RP3003155 


16155 


R-NT2RP3003155 


16209 




NT2RP3004028 


F-NT2RP3004028 


16156 


R-NT2RP3004028 


16210 


30 


0VARC1O0O0O8 


F-OVARC 1000008 


16157 


R-OVARC 1000008 


16211 


0VARC1000724 


F-0VARC1000724 


16158 


R-0VARC1000724 


16212 




0VARC1000751 


F-0VARC1000751 


16159 


R-OVARC1000751 


16213 




OVARC1001029 


F-0VARC1001029 


16160 


R-OVARC1001029 


16214 


35 


PLACE1000814 


F-PLACE1000814 


16161 


R-PLACE1000814 


16215 




PLACE1OO3030 


F-PLACE1003030 


16162 


R-PLACE1003030 


16216 




PLACE1005549 


F-PLACE1 005549 


16163 


R-PLACE1005549 


16217 




PLACE1007218 


F-PLACE1007218 


16164 


R-PLACE1007218 


16218 



40 



[0019] Furthermore, the present invention relates to the use of the above primers, as described below. 

(4) A polynucleotide which can be synthesized with the primer set of (2) or (3). 
45 (5) A polynucleotide comprising a coding region in the polynucleotide of (4). 

(6) A substantially pure protein encoded by polynucleotide of (4). 

(7) A partial peptide of the protein of (6). 

[0020] In addition, the present invention comprises a polynucleotide described below and a protein encoded by the 
50 polynucleotide. 

(8) An isolated polynucleotide selected from the group consisting of 

(a) a poiynucieoi.de comprising a coding region of the nucleotide sequence set forth in any one of the SEQ 
55 id NOs in Tables 350 and 351 ; 

(b) a polynucleotide comprising a nucleotide sequence encoding a protein comprising the ammo acid sequence 
set forth in any one of the SEQ ID NOs in Tables 350 and 351 ; 

(c) apolynucleotide comprising a nucleot.de sequence encoding a protein compris.ng an ammo acid sequence 
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. ccq | D nos in Tables 350 and 351 , in which one or 
selected from the amino acid sequences set forth in vn o ^ js functjonal , y equ .v- 

more amino acids are substituted, deleted, msertea. anwo • in0 acjd sequences set forth 

alent to the protein comprising said amino acid sequence selected from 

in the SEQ ID NOs in Tables 350 and 351; ™nwi«inn a nucleotide sequence selected from the 

(d) a polynucleotide that hybridizes with a polynucleotide co flpnsm ^ ^ ^ comprjses a nuc | eot ide 
nucleotide sequences set forth in the SEQ ID NOs ; in , lao nucleotide sequence se- 
sequence encoding a protein functionally = a> n to the protei enco^ ^ ^ ^ _ 

lected from the nucleotide sequences set forth in tne ^ jd seque nce ot a protein 

(e) a polynucleotide comprising a nucleotide sequence encod.ng a partial am 

encoded by the polynucleotide of (a) to (d); (he nucleolide sequence 

(f) a polynucleotide comprising a nucleotide sequence with a least 
set forth in any one of the SEQ ID NOs in Tables 350 and 351 

(9) . A substantially pure protein encoded by the po hrnudecMc , of (8) . 

(10) An antibody against the protein or peptide of any one of (6), (7), and (9). 
(111 A vector comprising the polynucleotide of (5) or (8). f _„ f/11 s 
(1 2) A Iransformant canying the polynucleotide o, (5) or (8 or he ve r ^ 

13 A transformant expressively carrying the polynucleotide Of (5y or 8 . or th vectm o , ^ 
(14) A method for producing the protein or peptide of any one ol (6). (7), and (9), compns g 
formant of (13) and recovering the expression product. Tables 35Q 

(1 9) A method for synthesizing a polynucleotide, the method comprising: 

a) synthesizing a complementary strand using a cDNA library as a template, and using the primer set of (2) 
or (3), or the primer of (16); and 

b) recovering the synthesized product. 

,20) The method of (19), wherein the cDNA library is obtainable by oligo-capping method. 

2? jZ method of 19 wherein the complementary strand is obtainable by PGR. 

( (22 ! I meThod for detecting the potynucleotide of (8), the method comprising. 

a) incubating a target polynucleotide with the oligonucleotide of (1 5) under the conditions where hybridization 
°b) deTecTng the hybridization of the target polynucleotide with the oligonucleotide of (1 5). 

( 2 3)Adatabaseofpo,ynu r ,desan = 

[0021] Any patents, patent applications, and publications cited herein are incorporated by reference. 
BRIEF DESCRIPTION OF THE DRAWINGS 

[0022] Figure 1 shows the restriction maps of ^^^^l^^^e intensities of gene expression 
t 0023] Figure 2 shows the reproducibility of «^JJ^ n ^ a faxis a s we,, as in the horizontal axis, 
observed in independent set of «P?^ en ? ™ P^££ ana i ys l The intensity of expression is shown in the 
[0024] Figure 3 shows the detection limit in gene expression anays. 
vertical axis ana the concentration (ug/ml) of probe used is shown in the horizontal axis. 

DETAILED DESCRIPTION O F THE INVENTION 

, ,ip in which multiple nucleotides are polymerized. There are 
[0025] Herein, "polynucleotide" is defined as a moiecuie in polymer contains relatively low number 

no limitations in the number of the polymerized nucleotides, in case 
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of nucleotides, it is also described as an "oligonucleotide". The polynucleotide or the oligonucleotide of the present 
invention can be a natural or chemically synthesized product. Alternatively, it can be synthesized using a template DNA 
by an enzymatic reaction such as PCR. 

[0026] All the cDNA provided by the invention are full-length cDNA. Herein, a "full-length cDNA" is defined as a cDNA 
which contains both ATG codon (the translation start site) and the stop codon. Accordingly, the untranslated regions, 
which are originally found in the upstream or downstream of the protein coding region in natural mRNA, may or may 
not be contained. 

[0027] An "isolated polynucleotide" is a polynucleotide the structure of which is not identical to that of any naturally 
occurring nucleic acid or to that of any fragment of a naturally occurring genomic nucleic acid spanning more than 
three separate genes. The term therefore covers, for example, 

(a) a DNA which has the sequence of part of a naturally occurring genomic DNA molecule but is not flanked by 
both of the coding sequences that flank that part of the molecule in the genome of the organism in which it naturally 
occurs; 

(b) a nucleic acid incorporated into a vector or into the genomic DNA of a prokaryote or eukaryote in a manner 
such that the resulting molecule is not identical to any naturally occurring vector or genomic DNA; 

(c) a separate molecule such as a cDNA, a genomic fragment, a fragment produced by polymerase chain reaction 
(PCR), or a restriction fragment; and 

(d) a recombinant nucleotide sequence that is part of a hybrid gene, i.e., a gene encoding a fusion protein. Spe- 
cifically excluded from this definition are nucleic acids present in mixtures of different (i) DNA molecules, (ii) trans- 
fected cells, or (iii) cell clones: e.g., as these occur in a DNA library such as a cDNA or genomic DNA library. 

[0028] The term "substantially pure" as used herein in reference to a given polypeptide means that the protein or 
polypeptide is substantially free from other biological macromolecules. The substantially pure protein or polypeptide 
is at least 75% (e.g., at least 80, 85, 95, or 99%) pure by dry weight. Purity can be measured by any appropriate 
standard method, for example, by column chromatography, poiyacrylamide gel electrophoresis, or HPLC analysis. 
[0029] All the clones (5602 clones) of the present invention are novel and encode the full-length proteins. All the 
clones were prepared by oligo capping method, which can achieve cDNA cloning with high fullness ratio. The cDNA 
clones were selected by using ATGprl score as an index of the fullness ratio at the 5'-end, based on the sequence 
features of the 5'-end sequences. Selection was further carried out by searching GenBank database for EST sequences 
homologous to 5'-end sequence of each clone by BLAST [S.F. Altschul, W. Gish, W. Miller, E.W. Myers & D.L. Lipman 
J. Mol. Biol., 215:403-410 (1990); W. Gish, & D.J. States, Nature Genet., 3:266-272 (1993)] and by considering the 
number of matching (identical) EST sequences or the number of continuous amino acids in the 5'-end sequence initiated 
from the initiation codon. 

[0030] Moreover, the clones were turn out to be not identical to any of the known human mRNA (namely novel) by 
homology search using the 5'-end sequence. 

[0031] The primers of the present invention, which are used for synthesizing full-length cDNA, are selected from the 
group comprising SEQ ID NO: 1-5547 (5'-primer), or SEQ ID NO: 5548-10463 (3'-primer). Further, the primers of the 
present invention, which are used for synthesizing full-length cDNA, are selected from SEQ ID NO: 16111-16164 (5'- 
primer), or SEQ ID NO: 16165-16218 (3'-primer). Some of the nucleotides include a known EST as its part. However, 
the primers of the present invention are novel in terms that the primers enable to synthesize full-length cDNA. Because 
the known ESTs lack important information on what part of cDNA the ESTs correspond to, it is impossible to design 
primers on the basis of the ESTs. 

[0032] All the full-length cDNA of the present invention can be synthesized using a primer set comprising the nucle- 
otide sequences selected from both the 5-and 3'-end sequences, or a set comprising a primer based on the 5'-end 
sequence and an oligo-dT primer, by a method such as PCR (Current protocols in Molecular Biology (1987) Ausubel 
et al. edit, John Wiley & Sons, Section 6.1-6.4). 

[0033] Specifically, PCR can be performed using an oligonucleotide that has 1 5 nucleotides longer, and specifically 
hybridizes with the complementary strand of the polynucleotide that contains the nucleotide sequence selected from 
the 5'-end sequences shown in Table 1 and 2 (SEQ ID NO: 1-5547, or SEQ ID NO: 16111-16164), and an oligo-dT 
primer as a 5'-, and 3'-primer, respectively. The length of the primers is usually 1 5-1 00 bp, and favorably between 1 5-35 
bp. In case of LA PCR, which is described below, the primer length of 25-35 bp may provide a good result. 
[0034] A method to design a primer that enables a specific amplification based on the gwen nucleotide sequence is 
known to those skilled in the art (Current Protocols in Molecular Bioiogy. Ausubei et a!, edit, (1 987) John Wiley & Sons ; 
Section 6.1 -6.4). In designing a primer based on the 5'-end sequence, the primer is designed so as that, in principle, 
the amplification products will include the translation start site. Accordingly, in case that a given 5'-end nucleotide 
sequence is the 5'- untranslated region (5'UTR). any part of the sequence can be used as a 5'-primer as far as the 
specificity toward the target cDNA is insured. The translation start site can be predicted using a known method such 
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10 



15 



20 



25 



30 



35 



as the ATGpr as described below. seauence to be amplified can extend to several 

[0035] When synthesizing a polynucleotide, the target ™ c ' e °"^ 4 nucleotides by US ing such as LA PCR (Long 
thousand bp in some cDNA. However, it is possible to amplify sucn * v ^ ^ LA pCR jn whjch a specia , 
and Accurate PCR). It is advantageous to use LA PCR when ^" 9 d nucleo tides can be removed. Accordingly, 
DNApolymerasehavinga^S-exonucleaseactiv.ty ,s used, ■™™™2^**W "^ leotide se( ^ uence By USinQ LA 
accurate synthesis of the complementary strand can be achieved even 
PCR. it is reported that amplification of a nucleot.de with 20 k c y dQ . sha) 

(TaReshi Hayashi (1996) Jikxen-lgaku by using cDNA libraries 

[0036] A template DNA for synthesiz.ng the cDN A of the ' P re ^ n ^ nn °" c here are lhose with high fullness ratio, 
that are prepared by various methods. The cDNA library using the 

which were obtained using a combination of (1) a method to prepare a mum e g (se iection based on the 

oligo-capping method, and (2) an it is possible 

estimation by the ATGpr after removing T^™^mwo**i by the present invention, not by the above 
to easily obtain a full-length cDNA by using the primers that are provraeo oy me w 

described specialized method. mothn n* nr rnm merciallv available is that mRNA contained 

The problem rth me cDNA mm* prepay oy e kn.we ™,ho s o ommer * a a ^ ^ ^ 

rss." ess r,=irra:^^rr, s, — v j. . 

deoirat* to symthdsiae a tdtl-lendtp d ™» ™» e ™ ^* 0 , ,„„ ,„»,„«„ can be esed .» Male me ragula.dry 
[0037] Tbe 5'-end sequence bl me lull-length CDNA Clones or ™ Jhe 
element ol transition inching me promote, on rna gonome. B he epnng OU a IJJ™^ » ome „ 
human genome (analysis »t human «^«" s '7 lt , e SSumZ genomid sequence Is going to be 

denned in the ATGpr program) is matter than Out and me , MW^rrtsW U J ^ 

stjw«««22?»* ?~ — «.k.» - - *— - 
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Examples or below. 



(1)1516 clones . TPnr1 Qrore hioher than 0 .3 ,1 51 6 clones are novel full-length 



human EST. 



(2) Tmo°ng the 3690, 377 clones are novel full-length clones, in wh,ch the number of human EST having identical 
45 sequence at both 5'- and 3'-ends is 1 to 5. 

<3) Amon?the S 3690, 1 797 clones are nove, full-length clones, in which the number of human EST having identical 
sequence at the 5'-end is not more than 20 (except the clones described above). 



(4) 453 clones AT rn. 1 score is 0 3 or less, the following 453 clones are esti- 

Amonc the 1857 clones in which the maximal ATGpn score is u.o u. «», » 

mateTto^e nove, length c,™ 

least either of the sequences of the. J^^^™ me information of the frequency of the 

ATGpr2 score is determined by using the ATG r ^™ ^ 9 and me stop codon (the maximal length 
six nucleotides contained with.n the sequence between the Ai fa e ^ Bioinformatics, 

is 300 nucleotides from the ATG codon) (Salamo A.A ^ the ATGpr2 score is described as the 

14: 384-390, http://vvww.hn.co.jp/atgpr/). The ATGpr program for caicuiai. g 
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ATGpr2 program in the followings. 



(5) 24 clones 

Among the 1857 clones, 24 clones are estimated to be full-length since their maximal ATGpr2 scores are 
5 higher than 0.3, and also novel, though they have low scores in ATGpn program, in which the number of the 

human EST having identical sequence at both 5'- and 3'-ends is 1 to 5. 



(6) 65 clones 

Among the 1857 clones, 65 clones are estimated to be full-length since, though they have low scores in both 
programs, ATGpn and ATGpr2, the scores are the maximum in comparison to those of the other clones in the 
same cluster (at least two clones). The clones are also novel, if at least either of the sequences of the 5*- and 3'- 
ends, or both are not identical to those of any human EST. 



(7) 32 clones 

15 Among the 1 857 clones, 32 clones are estimated to be full-length since, though they have low scores in both 

programs, ATGpn and ATGpr2, the scores are the maximum in comparison to those of the other clones in the 
same cluster (at least two clones). The clones are also novel., if the number of the human EST having identical 
sequence at both 5'- and 3'-ends is 1 to 5. 

20 (8) 36 clones 

Among the 1857 clones, 36 clones are full-length, which were selected by assembling the sequences of the 
other clones or human EST, although they have low scores in both programs, ATGpn and ATGpr2. The clones 
are also novel, if at least either of the sequences of the 5'- and 3'-ends, or both are not identical to those of any 
human EST 

25 

(9) 81 clones 

Among the 1857 clones, 81 clones are full-length, which were selected by assembling the sequences of the 
other clones or human EST, although they have low scores in both programs, ATGpn and ATGpr2. The clones 
are also novel, if the number of the human EST having identical sequence at the 5*-end is not more than 20 (other 
30 than the clones in which at least either of the sequences of the 5'- and 3'-ends, or both are not identical to those 

of any human EST). 

(10) 938 clones 

Among the 1857 clones, 938 clones are estimated to be full-length according to the fullness ratio shown in 
35 Table 4, although they have low scores in both programs, ATGpn and ATGpr2. The clones are also novel, if at 
least the sequence of the 5'-end is not identical to those of any human EST 

(11) 228 clones 

Among the 1857 clones, 228 clones are estimated to be full-length according to the fullness ratio shown in 
40 Table 7, although they have low scores in both programs, ATGpn and ATGpr2. The clones are also novel, if at 

least the sequence of the 3'-end is not identical to those of any human EST 

(12) 3 clones 

Three clones, HEMBA1006812, HEMBB1 001 871 , and NT2RP3001282. whose maximal ATGpn values are 
45 higher than 0.3, are full-length and novel clones whose 5'-end sequences presumably contain a coding region 

which is initiated with ATG codon and which encodes 100 amino acids or more. 



50 



55 



(13) 52 clones 

The following 52 clones, which have maximal ATGpn 
shown in Table 4 although the fullness ratios are low: 
HEMBA1 000497, HEMBA1001 750, HEMBA1 003854, 
HEMBA1 006038, MEMBA1 006092. HEMBA1 006406. 
MAMMA1001252, MAMMA1 002094. NT2RM4000634, 
NT2RM4001 1 78, NT2RM4002420, NT2RP20001 98, 
NT2RP2001460, NT2RP2001 756, NT2RP2002056, 
NT2RP20031 01 , NT2RP2003799, NT2RP2004095, 
NT2RP2005776, NT2RP2005806, NT2RP2005882, 
NT2RP3004028. OVARC1 000008, OVARC1 000724, 



values of 0.3 or less, are full-length with the fullness ratios 



HEMBA1004193, 
HEMBA1 006650, 
NT2RM4000657. 
NT2RP2000551, 
NT2RP2002677, 
NT2RP2004732, 
NT2RP3001723, 
OVARC1 000751. 



HEMBA1 004860, 
HEMBB1 000672, 
NT2RM4000783 
NT2RP2000660, 
NT2RP2002755, 
NT2RP2004920. 
NT2RP3002099, 
OVARC1001029. 



HEMBA1 005572, 
HEMBB1001197, 
NT2RM4000857. 
NT2RP2001214, 
NT2RP2002843, 
NT2RP2005454, 
NT2RP3003155. 
PLACE1000814, 
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PLACE 1003030, PLACE 1 005549, PLACE1007218. NT2RP4002298. 

Moreover, the clones are novel clones whose 5' -end sequences ^^^^J^^^Sm 
initiated with ATG codon and which encodes 50 amino acids or more. Among them, the followmg 20 Cones ,s predated 
to contain a coding region with 100 amino acids or more and should encode ^P rote ' ns _ W T?nP?nni7S6 
5 HEMBA1000497, HEMBA1003854, HEMBA1004193. NT2RM4000657, NT2RM4001UB, NT2RP2001756, 
NT2RP2002677, NT2RP2002755, NT2RP2002843, NT2RP2004095. NT2RP2004920 ^2RP2005806, 
NT2RP3002099, NT2RP3003155, OVARC1000724, OVARC1001029, PLACE1000814. PLACE1003030, 

PLACE1005549, PLACE1 007218. , .. „,„ mto ,- APae 

[0039] The protein encoded by the polynucleotide of the invention can be prepared as a recombinant prate* .or as 

w a natural protein. For example, the recombinant protein can be prepared by inserting the polynucleotide encoding the 
protein of the invention into a vector, introducing the vector into an appropriate host cell and punfymg the protein 
expressed within the transformed host cell, as described below. In contrast, the natural protein can be prepared for 
example, by utilizing an affinity column to which an antibody against the protein of the invention (Current Protocols in 
Molecular Biology (1987) Ausubel et al. edit, John Wily & Sons. Section 16.1-16.19) is attached. The antibody used 

15 for affinity purification may be either a polyclonal antibody, or a monoclonal antibody. Alternatively, in v.tro translation 
(See «or example. "On the fidelity of mRN A translation in the nuclease-treated rabbit reticulocyte lysate system. Dasso 
M C and Jackson R.J. (1989) Nucleic Acids Res. 17: 3i 29-31 44) may be used for preparing the protein of the invention. 
[0040] Protons functionally equivalent to the proteins of the present invention can be prepared based on the activities, 
which were clarified in the above-mentioned manner, of the proteins of the present invention. Using the biological 

20 activity possessed by the protein of the invention as an index, it is possible to verify whether or not a particular protein 
is functionally equivalent to the protein of the invention by examining whether or not the protein has said activity. 
[0041] Proteins functionally equivalent to the proteins of the present invention can be prepared by those skilled in 
the art for example by using a method for introducing mutations into an amino acid sequence of a protein (for example, 
site-directed mutagenesis (Current Protocols in Molecular Biology, edit, Ausubel et al., (1987) John Wiley & Sons, 

25 Section 8 1-8 5) Besides, such proteins can be generated by spontaneous mutations. The present invention comprises 
the proteins having one or more amino acids substitutions, deletions, insertions and/or additions in the ammo acid 
sequences of the proteins of the present invention (Tables 350 and 351), as far as the proteins have the equivalent 
functions to those of the proteins identified in the present Examples described later. 

r0042] There are no limitations in the number and sites of amino acid mutations, as far as the proteins ma.nta.n the 
so functions thereof. The number of mutations is typically 30% or less, or 20% or less, or 10% or less, preferably withm 
5% or less or 3% or less of the total amino acids, more preferably within 2% or less or 1 % or less of the total ammo 
acids From the viewpoint of maintaining the protein function, it is preferable that a substituted ammo has a similar 
property to that of the original amino acid. For example, Ala, Val. Leu. lie. Pro, Met, Phe and Trp are assumed to have 
similar properties to one another because they are all classified into a group of non-polar amino acids. Similarly, sub- 
35 stitution can be performed among non-charged amino acid such as Gly, Ser, Thr, Cys, Tyr, Asn, and Gin, acidic ammo 
acids such as Asp and Glu. and basic amino acids such as Lys. Arg, and His. 

[0043] In addition, proteins functionally equivalent to the proteins of the present invention can be isolated by using 
techniques of hybridization or gene amplification known to those skilled in the art. Specifically, using the hybndization 
technique (Current Protocols in Molecular Biology, edit, Ausubel et al., (1987) John Wiley & Sons, Section 6.3-6.4)), 

40 those skilled in the art can usually isolate a DNA highly homologous to the DNA encoding the protein identified in the 
present Example based on the identified nucleotide sequence (Tables 350 and 351) or a portion thereof and obtain 
the functionally equivalent protein from the isolated DNA. The present invention include proteins encoded by the DNAs 
hybridizing with the DNAs encoding the proteins identified in the present Example, as far as the proteins are functionally 
equivalent to the proteins identified in the present Example. Organisms from which the functionally equivalent proteins 

45 are isolated are illustrated by vertebrates such as human, mouse, rat, rabbit, pig and bovine, but are not l.m.ted to 
these animals. 

[0044] Washing conditions of hybridization for the isolation of DNAs encoding the functionally equivalent proteins 
are usually "1 x SSC, 0.1% SDS, 37"C"; more stringent conditions are "0.5 x SSC, 0.1% SDS, 42"C"; and still more 
stringent conditions are "0.1 x SSC, 0.1% SDS, 65°C". Alternatively, the following conditions can be given as hybnd- 

50 ization conditions of the present invention. Namely, conditions in which the hybridization is done at "6 x SSC, 40% 
Formamide, 25'C", and the washing at "1 x SSC. 55'C" can be given. More preferable conditions are those in which 
the hybridization is done at "6 x SSC 40% Formamide. 37'C", and the washing at "0.2 x SSC, 55"C". Even more 
preferable are those in which the hybridization is done at "6 < SSC. 50% Formamide, 37"C", and the washing a! "0.1 
x SSC, 62°C" The more stringent the conditions of hybridization are. the more frequently the DNAs higniy nomologous 

55 to the probe sequence are isolated. Therefore, it is preferable to conduct hybridization under stringent conditions. 
Examples of stringent conditions in the present invention are, washing conditions of "0.5 x SSC, 0.1% SDS, 42°C", 
or alternatively, hybridization conditions of "6 x SSC, 40% Formamide, 37'C", and the washing a. "0.2 x SSC. 55'C". 
However, the above-mentioned combinations of SSC. SDS and temperature conditions are indicated just as examples. 
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Those skilled in the art can select the hybridization conditions with similar stringency to those mentioned above by 
properly combining the above-mentioned or other factors (for example, probe concentration, probe length and duration 
of hybridization reaction) that determines the stringency of hybridization. 

[0045] The amino acid sequences of proteins isolated by using the hybridization techniques usually exhibit high 
5 homology to those of the proteins of the present invention, which are shown in Tables 350 and 351. The present 
invention encompasses a polynucleotide comprising a nucleotide sequence that has a high identity to the nucleotide 
sequence of claim 8 (a). 

Furthermore, the present invention encompasses a peptide, or protein comprising an amino acid sequence that has 
a high identity to the amino acid sequence encoded by the polynucleotide of claim 8 (b). The term "high identity" 
10 indicates sequence identity of at least 40% or more; 

preferably 60% or more; and more preferably 70% or more. Alternatively, more preferable is identity of 90% or more, 
or 93% or more, or 95% or more, furthermore, 97% or more, or 99% or more. The identity can be determined by using 
the BLAST search algorithm. 

[0046] With the gene amplification technique (PCR) (Current Protocols in Molecular Biology, edit, Ausubel et al., 
is (1987) John Wiley & Sons, Section 6.3-6.4)) using primers designed based on the nucleotide sequence (Tables 350 
and 351) or a portion thereof identified in the present Example, it is possible to isolate a DNA fragment highly homol- 
ogous to the polynucleotide sequence or a portion thereof and to obtain functionally equivalent protein to a particular 
protein identified in the present Example based on the isolated DNA fragment. 

[0047] The "percent identity" of two amino acid sequences or of two nucleic acids is determined using the algorithm 

20 of Karlin and Altschul (Proc. Natl. Acad. Sei. USA 87:2264-2268, 1 990), modified as in Karlin and Altschul (Proc. Natl. 
Acad. Sei. USA 90:5873-5877, 1993). Such an algorithm is incorporated into the BLASTN and BLASTX programs of 
Altschul etal. (J. Mol. Biol.215:403-410, 1990). BLAST nucleotide searches are performed with the BLASTN program, 
score = 100, wordlength = 12. BLAST protein searches are performed with the BLASTX program, score = 50, word- 
length = 3. When gaps exist between two sequences, Gapped BLAST is utilized as described in Altschul et al. (Nucleic 

25 Acids Res.25:3389-3402,1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the 
respective programs (e.g., BLASTX and BLASTN) are used. See http://www.ncbi.nlm.nih.gov. 
[0048] The present invention also includes a partial peptide of the proteins of the invention. The partial peptide 
comprises a protein generated as a result that a signal peptide has been removed from a secretory protein. If the 
protein of the present invention has an activity as a receptor or a ligand, the partial peptide may function as a competitive 

30 inhibitor of the protein and may bind to the receptor (or ligand). In addition, the present invention comprises an antigen 
peptide for raising antibodies. For the peptides to be specific for the protein of the invention, the peptides comprise at 
least 7 amino acids, preferably 8 ammo acids or more, more preferably 9 amino acids or more, and even more preferably 
10 amino acids or more. The peptide can be used for preparing antibodies against the protein of the invention, or 
competitive inhibitors of them, and also screening for a receptor that binds to the protein of the invention. The partial 

35 peptides of the invention can be produced, for example, by genetic engineering methods, known methods for synthe- 
sizing peptides, or digesting the protein of the invention with an appropriate peptidase. 

[0049] The present invention also relates to a vector into which the DNA of the invention is inserted. The vector of 
the invention is not limited as long as it contains the inserted DNA stably. For example, if E. coti is used as a host, 
vectors such as pBluescript vector (Stratagene) are preferable as a cloning vector. To produce the protein of the in- 

40 vention , expression vectors are especially useful. Any expression vector can be used as far as it is capable of expressing 
the protein in vitro, in E. coti, in cultured cells, or in vivo. For example, pBEST vector (Promega) is preferable for in 
vitro expression, pET vector (Invitrogen) for E. coli, pMEi8S-FL3 vector (GenBank Accession No. AB009864) for cul- 
tured cells, and pME18S vector (Mol. Cell. Biol. (1988) 8: 466-472) for in vivo expression. To insert the DNA of the 
invention, ligation utilizing restriction sites can be performed according to the standard method (Current Protocols in 

45 Molecular Biology (1987) Ausubel et al. edit, John Wily & Sons, Section 11 .4-11. 11). 

[0050] The present invention also relates to a transformant carrying the vector of the invention. Any cell can be used 
as a host into which the vector of the invention is inserted, and various kinds of host cells can be used depending on 
the purposes. For strong expression of the protein in eukaryotic cells, COS cells or CHO cells can be used, for example. 
[0051] Introduction of the vector into host cells can be performed, for example, by calcium phosphate precipitation 

50 method, electroporation method (Current Protocols in Molecular Biology (1987) Ausubel et al. edit, John Wily & Sons, 
Section 9.1-9.9), lipofectamine method (GIBCO-BRL), or microinjection method, etc. 

[0052] The primer of the present invention can be used for synthesizing full-length cDNA, and also for the detection 
and/or diagnosis of the abnormality of the protein of the invention encoded by the full-length cDNA. For example, by 
utilizing polymerase chain reaction (genomic DNA-PCR, or RT-PCR) using the primer of the invention, DNA encoding 
55 the protein of the invention can be amplified. It is also possible to obtain the regulatory region of expression in the 5'- 
upsiream by using PCR or hybridization since the transcription start site within the genomic sequence can be easily 
specified based on the 5'-end sequence of the full-length cDNA The obtained genomic region can be used for detection 
and/or diagnosis of the abnormality of the sequence by RFLP analysis, SSCR or d.rect sequencing. 
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*t of loact 1 r nucleotides, comprising a nucleotide sequence 
[0053] Furthermore, the "polynucleotide having a length of £ ^^E^,^ an 9 y one 0( SEQ ID NOs in 
that is complementary to a polynucleotide comprising the nucleotide sequence » ' . th PXDression 

of the protein of the invention. To exert the antisense effect, the antisense poiynucieuuue «• « 
5 bp or more, for exampie, 50 bp or more, preferably 100 bp or more, and more P^rably 500 bp or more, an has a 
length of usually 3000 bp or less and preferably 2000 bp or less. The antisense DMA can be ^^<^™ 
of the diseases that are caused by the abnormality of the protein of the invention , (abnormal functor >«»nomtf 
expression). Said antisense DNA can be prepared, for example, by the P^™ 1 *™*™^ 
properties of phosphoroth.oate oligodeoxynucleotides." Stein (1988) Nucleic Acids Res, 16: 3209-3221) based on the 
to nucleotide sequence of the DNA encoding the protein (for example, the DNA set forth ,n any one of SEQ ID NOs in 
Tables 350 and 351). , u 

[0054] The polynucleotide or antisense DNA of the present invention can be used in gene therapy, for example, by 
administrating it into a patient by the in vivo or ex vivo method with virus vectors such as retrovirus vectors, adenovirus 
vectors, and adeno-associated virus vectors, or non-virus vectors such as liposome. 
is [0055] The present invention also relates to antibodies that bind to the protein of the invention 

There are no limitations in the form of the antibodies of the invention. They include polyclonal antibodies, monoclonal 
antibodies, or their portions that can bind to the protein of the invention. They also include antibodies of all classes. 
Furthermore, special antibodies such as humanized antibodies are also included. 

[0056] The polyclonal antibody of the invention can be obtained according to the standard method by synthesizing 
20 an oligopeptide corresponding to the amino acid sequence and immunizing rabbits with the peptides (Current Protocols 
in Molecular Biology (1987) Ausubel et al. edit, John Wily & Sons, Section 11.12-11.13). The monoclonal antibody of 
the invention can be obtained according to the standard method by purifying the protein expressed in £. coh, immunizing 
mice with the protein, and producing a hybridoma cell by fusing the spleen cells and myeloma cells (Current Protocols 
in Molecular Biology (1987) Ausubel et al. edit, John Wily & Sons, Section 11.4-11. 11). 
25 [0057] The antibody binding to the protein of the present invention can be used for purification of the protein of the 
invention and also for detection and/or diagnosis of the abnormalities of the expression and structure of the protein. 
Specifically, proteins can be extracted, for example, from tissues, blood, or cells, and the protein of the invention is 
detected by Western blotting, immunoprecipitation, or ELlSA, etc. for the above purpose. 

[0058] Furthermore, the antibody binding to the protein of the present invention can be utilized for treating the dis- 
30 eases that associates with the protein of the invention. If the antibodies are used for treating patients, human antibodies 
or humanized antibodies are preferable in terms of their low antigenicity The human antibodies can be prepared by 
immunizing a mouse whose immune system is replaced with that of human ("Functional transplant of megabase human 
immunoglobulin loci recapitulates human antibody response in mice" Mendez M.J. et al. (1997) Nat. Genet 15: 
146-156). The humanized antibodies can be prepared by recombination of the hypervariable region of a monoclonal 
35 antibody (Methods in Enzymology (1991)203: 99-121). 

[0059] The cDNA of the present invention encodes the amino acid sequence of a protein which is predicted to have 
the function(s) described below based on the homology search of the GenBank and SwissProt. Specifically, for instance, 
as shown in EXAMPLES, searching a known gene or protein that is homologous to the partial sequence of the full- 
length cDNA of the invention (5602 clone) and referring the function of the gene and of the protein encoded by the 
40 gene make it possible to predict the function of the protein encoded by the cDNA of the invention. In this way, each of 
1437 clones out of the 5602 full-length cDNA clones of the invention was predicted to encode a protein that was 
classified into one or more of the following categories. 

Secretory or membrane protein (261 clones) 
45 Glycoprotein-associated protein (113 clones) 

Signal transduction-associated protein (148 clones) 

Transcription-associated protein (233 clones) 

Disease-associated protein (437 clones) 

Enzyme or metabolism-associated protein (301 clones) 
so Cell division- or cell proliferation-associated protein (74 clones) 

Cytoskeleton-associated protein (92 clones) 

RNA synthesis-associated protein (280 clones) 

Nuclear protein (352 clones) 

Prolein synthesis- or transport-associated protein (112 ciones) 
55 Cellular defense-associated protein (23 clones) 

Development- or growth-associated protein (23 clones) 

[0060] It is also possible to predict the protein function by looking mto the ammo acid sequence for the motifs such 



134 

BNSDOCID <EP 107461?A£_I > 



EP1 074 617 A2 



10 



15 



20 



25 



30 



35 



40 



45 



50 



as the signal sequence, transmembrane region, nuclear translocation signal, glycosylation signal, phosphorylation site, 
Zinc finger motif, and SH3 domain. The programs, PSORT (Nakai K. ( and Kanehisa M. (1992) Genomics 14: 897-911), 
SOSUI (HirokawaT. etaL (1998) Bioinformatics 14. 378-379) (Mitsui Information Developing Inc.), and ME MS AT (Jones 
D.T., Taylor W.R., and Thornton J.M. (1994) Biochemistry 33: 3038-3049) can be used to predict the existence of the 
signal sequence or transmembrane region. Alternatively, a partial amino acid sequence of the protein is fused with 
another protein such as GFP, the fusion protein is transfected into cultured cells, and the localization is analyzed to 
predict the function of the original protein. 

[0061] Based on the determined nucleotide sequences of the full-length cDNAs obtained in the present invention, it 
is possible to predict more detailed functions of the proteins encoded by the cDNA clones, for example, by searching 
the databases such as GenBank, Swiss-Prot and UniGene for homologies of the cDNAs; or by searching the amino 
acid sequences deduced from the full-length cDNAs for signal sequences by using software programs such as PSORT, 
for transmembrane regions by using software programs such as SOSUI or for motifs by using software programs such 
as Pfam (http://www.sanger.ac.uk/Software/Pfam/index.shtml) and PROSITE (http://wwexpasy.ch/prosite/). As a 
matter of course, the functions are often predictable by using partial sequence information (preferably 300 nucleotides 
or more) instead of the full-length nucleotide sequences. However, the result of the prediction by using partial nucleotide 
sequence does not always agree with the result obtained by using full-length nucleotide sequence, and thus, it is 
needless to say that the prediction of function is preferably performed based on the full-length nucleotide sequences. 
GenBank, Swiss-Prot and UniGene databases were searched for homologies of the full-length nucleotide sequences 
of the 4997 clones (see Example 18). The amino acid sequences deduced from the full-length nucleotide sequences 
were searched for functional domains by PSORT, SOSUI and Pfam. Prediction of functions of proteins encoded by the 
clones and the categorization thereof were performed based on these results obtained. 
The following 798 clones were categorized into secretory and/or membrane proteins. 
HEMBA1000356, HEMBA1 00051 8, HEMBA1 000531 , HEMBA1000637, HEMBA1 000719, 



HEMBA1000822, 
HEMBA1001085, 
HEMBA1001557, 
HEMBA1002125, 
HEMBA1 002477, 
HEMBA1O02818, 
HEMBA1003086, 
HEMBA1 003742, 
HEMBA1 004341, 
HEMBA1 004850, 
HEMBA1005050, 
HEMBA1005699, 
HEMBA1006198, 
HEMBA1 006659, 
HEMBA1 007203, 
HEMBB1000317, 
HEMBB1000915, 
HEMBB1001348, 
HEMBB1001962, 
HEMBB1 002247, 
MAMMA1 000045, 
MAMMA1000416, 
MAMMA1 000778, 
MAMMA1001008, 
MAMMA1001154, 
MAMMA1001754, 
MAMMA1 002524, 
MAMMA1 002844, 
MAMMA1003089 : 
NT2RM1 000080, 
NT2RM 1000355, 
NT2RM1 000800, 



HEMBA1 000852, 
HEMBA1001286, 
HEMBA1001569, 
HEMBA1002150, 
HEMBA1 002486, 
HEMBA1 002876, 
HEMBA1 003096, 
HEMBA1 003803, 
HEMBA1 004461, 
HEMBA1 004889, 
HEMBA1 005552, 
HEMBA1 005991, 
HEMBA1 006293, 
HEMBA1 006758, 
HEMBA1 007301, 
HEMBB1 000556, 
HEMBB1 000975, 
HEMBB1001564, 
HEMBB1 002042, 
HEMBB1 002383, 
MAMMA1000129, 
MAMMA1 000472, 
MAMMA1 000798, 
MAMMA1001030, 
MAMMA1001322, 
MAMMA1001771, 
MAMMA1 002573, 
MAMMA1 002881, 
MAMMA1003146, 
NT2RM 1000092, 
NT2RM 1000430, 
NT2RM1 000811. 



NT2RM1 000905, NT2RM1 001 008, 
NT2RM1001115. NT2RM1 001 1 39, 
NT2RM2000402. NT2RM2000407, 



HEMBA1 000870, 
HEMBA1001351, 
HEMBA1001661, 
HEMBA1002166, 
HEMBA1 002609, 
HEMBA1 002921, 
HEMBA1 003281, 
H EM BA 1004055, 
HEMBA1 004577, 
HEMBA1 004923, 
HEMBA1 005576, 
HEMBA1 006036, 
HEMBA1006310, 
HEMBA1 006789, 
HEMBB1 000037, 
HEMBB1000593, 
HEMBB1001112, 
HEMBB1001630, 
HEMBB1 002044, 
HEMBB1 002387, 
MAMMA1000133, 
MAMMA1 000672, 
MAMMA1 000842, 
MAMMA1001041, 
MAMMA1001388, 
MAMMA1002009, 
MAMMA1 002598, 
MAMMA1 002890, 
MAMMA1 003150 
NT2RM1000131. 
NT2RM 1000563, 
NT2RM 1000833, 

NT2RM2000259, 
NT2RM2000422, 



HEMBA1000991, 
HEMBA1001407, 
HEMBA1001734, 
HEMBA1002417, 
HEMBA1 002659, 
HEMBA1 003071, 
HEMBA1 003286, 
HEMBA1004143, 
HEMBA1 004637, 
HEMBA1004930, 
HEMBA1005581, 
HEMBA1006038, 
HEMBA1 006492, 
HEMBA1 006921, 
HEMBB1 000050, 
HEMBB1000631, 
HEMBB1001151, 
HEMBB1001871, 
HEMBB1002142, 
HEMBB1 002550, 
MAMMA1 000277, 
MAMMA1 000684, 
MAMMA1 000859, 
MAMMA1 001073, 
MAMMA1001411, 
MAMMA1 002427, 
MAMMA1 002655, 
MAMMA1 002938, 
NT2RM1 000035. 
NT2RM1000199, 
NT2RM1 000648, 
NT2RM 1000857, 

NT2RM2000260, 
NT2RM2000490, 



HEMBA1 001052, 
HEMBA1001446, 
HEMBA1001746, 
HEMBA1 002462, 
HEMBA1002661, 
HEM BA1 003077, 
HEMBA1003538, 
HEMBA1004146, 
HEMBA1004752, 
H EM BA1 005029, 
HEMBA1 005588, 
HEMBA1 006067, 
HEMBA1 006502, 
HEMBA1 006926, 
HEMBB1 000054, 
HEMBB1 000763, 
HEMBB1001177, 
HEMBB1001872, 
HEMBB1002190, 
HEMBB1002600, 
MAMMA1 000278, 
MAMMA1000714, 
MAMMA1 000897, 
MAMMA1001080, 
MAMMA1001487, 
MAMMA1 002428, 
MAMMA1 002684, 
MAMMA1 002947, 
NT2RM 1000037. 
NT2RM1 000257, 
NT2RM1 000742, 
NT2RM1 000867, 

NT2RM2000287, 
NT2RM2000522, 



HEMBA1000817, 
HEMBA1001071, 
HEMBA1001515, 
HEMBA1001866, 
HEMBA1 002475, 
HEMBA1 002780, 
HEMBA1 003079, 
HEMBA1 003711 , 
HEMBA1 004207, 
H EM BA1 004756, 
HEMBA1 005035, 
HEMBA1005616, 
HEMBA1006173, 
HEMBA1 006583, 
HEMBA1 006976, 
HEMBB1 000175, 
HEMBB1 000827, 
HEMBB1001302, 
HEMBB1001925, 
HEMBB1002193, 
HEMBB1 002692, 
MAMMA1 000410, 
MAMMA1 000734, 
MAMMA1 000956, 
MAMMA1001139, 
MAMMA1001751, 
MAMMA1002461, 
MAMMA1 002769, 
MAMMA1003035, 
NT2RM1 000062, 
NT2RM1 000260, 
NT2RM1 000770, 
NT2RM1 000882, 

NT2RM2000395, 
NT2RM2000566, 
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PLACE1 005884, 
PLACE 1006 170, 
PLACE 1006760, 
PLACE1007111, 
PLACE1 007645, 
PLACE1008181, 
PLACE1 008625, 
PLACE1009110, 
PLACE 1009637, 
PLACE1010274, 
PLACE1010891, 
PLACE1011214, 
PLACE1011719, 
PLACE2000216, 
PLACE3000121, 
PLACE3000362, 



PLACE 1005934, 
PLACE 1006382, 
PLACE 1006779, 
PLACE 1007282, 
PLACE 1007743, 
PLACE 1006273, 
PLACE 1008696, 
PLACE1 009298, 
PLACE1009925, 
PLACE1010491, 
PLACE1010896, 
PLACE1011399, 
PLACE1011762, 
PLACE2000302, 
PLACE3000124, 
PLACE3000365, 



PLACE1 006076, 
PLACE1 006492, 
PLACE1 006795, 
PLACE1 007386, 
PLACE1 007746, 
PLACE1 008368, 
PLACE1 008867, 
PLACE1 009328, 
PLACE1009935, 
PLACE1010629, 
PLACE1010925, 
PLACE1011433, 
PLACE1011858, 
PLACE2000317, 
PLACE3000160, 
PLACE3000400, 



PLACE1006119, 
PLACE 1006629, 
PLACE1006805, 
PLACE1007416, 
PLACE1 007807, 
PLACE1 008405, 
PLACE1009027, 
PLACE1 009581, 
PLACE1010089, 
PLACE1010630, 
PLACE1010965, 
PLACE1011492, 
PLACE1011923, 
PLACE2000342, 
PLACE3000242, 
PLACE3000401, 



PLACE1006159, 
PLACE1 006704, 
PLACE1 006962, 
PLACE1 007484, 
PLACE1 007858, 
PLACE1 008532, 
PLACE1009039, 
PLACE1 009621, 
PLACE1010106, 
PLACE1010714, 
PLACE1011026, 
PLACE1011641, 
PLACE2000014, 
PLACE2000347, 
PLACE3000271, 
PLACE4000034, 



PLACE1006164, 
PLACE1 006731, 
PLACE1 007045, 
PLACE1 007544, 
PLACE 1008002, 
PLACE1 008568, 
PLACE1 009045, 
PLACE1 009622, 
PLACE1010152, 
PLACE1010739, 
PLACE1011046, 
PLACE1011649, 
PLACE2000039, 
PLACE2000379, 
PLACE3000353, 
PLACE4000089, 



PLACE4000522, PLACE4000558, 
SKNMC1000050, THYR01 000040, 



7HYRO1000197, THYR01 000241 . THYR01 000327, THYR01 000394, 
THYRO1000585, THYR01 000596, THYR01 000625, THYR01 000805, 
THYRO1001134, THYR01 001 1 73, THYRO1001213, THYR01 001262, 
Y79AA1000037, Y79AA1000800, Y79AA1 000976, Y79AA1001 078, 
Y79AA1 001 402, Y79AA1 001 585, Y79AA1 001 696, Y79AA1 001 71 1 , 
Y79AA1001827' Y79AA1O01875, Y79AA1002027, Y79AA1 00221 1 , Y79AA1 002234, Y79AA1002258 
[0099] On the other hand, clones of which expression levels decrease by RA/inhibrtor are as follows: 



THYR0 1000488, 
THYR01 000934, 
THYR0 1001 290, 
Y79AA1001228, 



THYR01 000501, 
THYRO1001133, 
THYRO1001721, 
Y79AA1001299, 



HEMBA1000012, 
HEMBA1 003591, 
HEMBA1 005528, 
HEMBB1 000055, 
NT2RM 1000257 
NT2RM2000371, 
NT2RM4001940, 
NT2RP2000965, 
NT2RP2005126, 
NT2RP3001383, 
NT2RP3003059, 
NT2RP4001950, 
OVARC 1000431, 
OVARC1001942, 
PLACE1004128, 
PLACE1 007507, 



HEMBA1 000501, 
HEMBA1003926, 
HEMBA1005570, 
HEMBB1 000244, 
NT2RM1000318, 
NT2RM2000594, 
NT2RM4002593, 
NT2RP2001397, 
NT2RP2005464, 
NT2RP3001621, 
NT2RP3004258, 
NT2RP4002047, 
OVARC1001051, 
OVARC1001943, 
PLACE1 005026, 
PLACE1 008941, 



HEMBA1 000946 
HEMBA1004168, 
HEMBA1 006467, 
HEMBB1001665, 
NT2RM1000539, 
NT2RM4000511, 

NT2RP1 000086, 
NT2RP2002047, 
NT2RP2005712, 
NT2RP3002081, 
NT2RP3004378, 
NT2RP4002408, 
OVARC1001129, 

PLACE1002171, 
PLACE1 005876, 
PLACE1010624, 



HEMBA1 003220, 
HEMBA1 004507, 
HEMBA1 006486, 
MAMMA1 000684, 
NT2RM 1000666, 
NT2RM4001U0, 
NT2RP1 000439, 
NT2RP2004226, 
NT2RP2005859, 
NT2RP3002181, 
NT2RP3004527, 
NT2RP5003459, 
OVARC1001176, 
PLACE1 002465, 
PLACE1 005923, 
PLACE1011090, 



H EM BA1 003403, 
HEMBA1 005009, 
HEMBA1 006492, 
MAMMA1001139, 
NT2RM2000092, 
NT2RM4001754, 
NT2RP1001073, 
NT2RP2004396, 
NT2RP2005890, 
NT2RP3002244, 
NT2RP3004594, 
OVARC1 000004, 
OVARC1001261, 
PLACE1003190, 
PLACE1 007257, 
PLACE1011219, 



HEMBA1 003569, 
HEMBA1 005296, 
HEMBA1 007322, 
MAMMA1001743, 
NT2RM2000192, 
NT2RM4001905, 
NT2RP2000098, 
NT2RP2004655, 
NT2RP3000980, 
NT2RP3002590, 
NT2RP4001760, 
OVARC1 000035, 
OVARC1001342, 
PLACE1 003375, 
PLACE1 007375, 
THYRO1000270, 



Y79AA1 000346 Y79AA1001541 

[0100] These clones are also associated with neural differentiation and, therefore, are candidates for genes assoc- 

S1T^^STe , ,Ttre S ?otein encoded by the cDNA of the present invention is a regulatory factor of cellular 
conditions such as growth and differentiation, itcan be used for developing medicines as follows. The protem or anybody 
provided by the invention is injected into a certain kind of cells by microinjection. Then, using the cells. ,t is possible 
to screen low molecular weight compounds by measuring the change in the cellular conditions, or the activation or 
nhSSToU particular gene The screening can be performed as follows. First, the protein is expressed and punted 
as ecomb inant The purified protein is microinjected into cells such as various cell lines, or primary cu ture cells, and 
me ceSut change such as growth and differentiation can be examined. Alternatively, the induction of genes whose 
expression s known o be associated with a particular change of cellular conditions may be detected by the amount 
of mRNA or protein. Or, the amount of intracellular molecules (low molecular weight compounds eXo.) that » changed 
bvThe function of the gene product (protein) which is known to be associated with a particular change of cellular 
by the function o, megs" v com nounds to be screened (both low and high molecular compounds are acceptable.) 
S S med a and assessed for their activity by measuring .he change of the cellular cono,t,ons. 
can be added to tne culture introduced witn the gene obtained in the invention can be used for the screening, 
instead of microinjection. ^""J*'™^™^ a p 9 artiC uiar change in the cellular condit.ons, the change of the 
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the function of the protein of the invention, it can be applied for developing medicines. 

[0102] If the protein encoded by the cDNA of the present invention is a secretory protein, membrane protein, or 
protein associated with signal transduction, glycoprotein, transcription, or diseases, it can be used in functional assays 
for developing medicines. 

5 [0103] In case of a membrane protein, it is most likely to be a protein that functions as a receptor or ligand on the 
cell surface. Therefore, it is possible to reveal a new relationship between a ligand and receptor by screening the 
membrane protein of the invention based on the binding activity with the known ligand or receptor. Screening can be 
performed according to the known methods. 

[0104] For example, a ligand against the protein of the invention can be screened in the following manner. Namely, 
10 a ligand that binds to a specific protein can be screened by a method comprising the steps of: (a) contacting a test 
sample with the protein of the invention or a partial peptide thereof, or cells expressing these, and (b) selecting a test 
sample that binds to said protein, said partial peptide, or said cells. 

[0105] On the other hand, for example, screening using cells expressing the protein of the present invention that is 
a receptor protein can also be performed as follows. It is possible to screen receptors that is capable of binding to a 
'5 specific protein by using procedures (a) attaching the sample cells to the protein of the invention or its partial peptide, 
and (b) selecting cells that can bind to the said protein or its partial peptide. 

[0106] In a following screening as an example, first the protein of the invention is expressed, and the recombinant 
protein is purified. Next, the purified protein is labeled, binding assay is performed using a various cell lines or primary 
cultured cells, and cells that are expressing a receptor are selected (Growth and differentiation factors and their re- 

20 ceptors, Shin-Seikagaku Jikken Kouza Vol.7 (1991) Honjyo, Arai, Taniguchi, and Muramatsu edit, p203-236, Tokyo- 
Kagaku-Doujin). A protein of the invention can be labeled with Rl such as 125 l, and enzyme (alkaline phosphatase 
etc.). Alternatively, a protein of the invention may be used without labeling and then detected by using a labeled antibody 
against the protein. The cells that are selected by the above screening methods, which express a receptor of the protein 
of the invention, can be used for the further screening of an agonists or antagonists of the said receptor. 

25 [0107] Once the ligand binding to the protein of the invention, the receptor of the protein of the invention or the cells 
expressing the receptor are obtained by screening, it is possible to screen a compound that binds to the ligand and 
receptor. Also it is possible to screen a compound that can inhibit both bindings (agonists or antagonists of the receptor, 
for example) by utilizing the binding activities. 

[0108] When the protein of the invention is a receptor, the screening method comprises the steps of (a) contacting 
30 the protein of the invention or cells expressing the protein of the invention with the ligand, in the presence of a test 
sample, (b) detecting the binding activity between said protein or cells expressing said protein and the ligand, and (c) 
selecting a compound that reduces said binding activity when compared to the activity in the absence of the test sample. 
Furthermore, when the protein of the invention is a ligand, the screening method comprises the steps of (a) contacting 
the protein of the invention with its receptor or cells expressing the receptor in the presence of samples, (b) detecting 
the binding activity between the protein and its receptor or the cells expressing the receptor, and (c) selecting a com- 
pound that can potentially reduce the binding activity compared to the activity in the absence of the sample. 
[0109] Samples to screen include cell extracts, expressed products from a gene library, synthesized low molecular 
compound, synthesized peptide, and natural compounds, for example, but are not construed to be listed here. A com- 
pound that is isolated by the above screening using a binding activity of the protein of the invention can also be used 
^0 as a sample. 

[0110] A compound isolated by the screening may be a candidate to be an agonist or an antagonist of the receptor 
of the protein. By utilizing an assay that monitors a change in the intracellular signaling such as phosphorylation which 
results from reduction of the binding between the protein and its receptor, it is possible to identify whether the obtained 
compound is an agonist or antagonist of the receptor. Also, the compound may be a candidate of a molecule that can 

45 inhibit the interaction between the protein and its associated proteins (including a receptor) in vivo. Such compounds 
can be used for developing drugs for precaution or cures of a disease with which the protein is associated. 
[0111] Secretory proteins may regulate cellular conditions such as growth and differentiation. It is possible to find 
out a novel factor that regulates cellular conditions by adding the secretory protein of the invention to a certain kind of 
cell, and performing a screening by utilizing the cellular changes in growth or differentiation, or activation of a particular 

so gene. 

[0112] The screening can be performed, for example, as follows. First, the protein of the invention is expressed and 
purified m a recombinant form. Then : the purified protein is added to a various kind of ceil lines or primary cultured 
cells, and the change in the cell growth and differentiation is monitored. The induction of a particular gene that is known 
to be involved in a certain cellular change is detected by the amounts of mRNA and protein. Alternatively, the amount 
of an intracellular molecule (low-molecular-weight compounds, etc.) that is changed by the function of a gene product 
fDrotein) that is known to function in a certain cellular change is used for the detection. 

Once the screening reveals that the protein of the invention can regulate cellular conditions or the functions, 
■ply the protein as a pharmaceutical and diagnostic medicine for associated diseases by itself or by 
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altering a part of it into an appropriate composition. be used 

[0114] As is above deserted for membrane prolans, the secretory ^ t0 a known y , jgand or 

to explore a novel ligand-receptor .nteraction using a resulti ng compounds obtained by the 

receptor. A similar method can be used to identify an agonist or antagonist, i ne re^u y k , ' 

methods can be a candidate of a compound that can inhibit the interaction between the protein of the .menton and 
an interacting molecule (including a receptor). The compounds may be able to use as a preventive, therapeutic, and 
diagnostic medicine for the diseases, in which the protein may play a certain role. 

101151 Proteins associated with signal transduction or transcription may be a factor that affects a certain protein or 
gene in response to intracellular/extracellular stimuli. It is possib.e to find out a novel factor ^^^^Z 
gene by expressing the protein provided by the invention in a certain types of cells, and performing a screening utilizing 

S,??:£^mtSSK ^ First, a transformed ce e expressing the protein is obtained 

Tnen the transformed eel, L and the untransformed original eel, line are compare ^^^!^ 
of a certain gene by detecting the amount of its mRNA or protein. Alternatively, the amount of an "^^™^™> 
low moleJar weight compounds) that is changed by the function of a certain gene product (pro* ma be used for 
he detection Furthermore the change of the expression of a certain gene can be detected by introducing a fusion 
aene that comprise, a regulatory regL of the gene and a marker gene (luciferase. beta-galactos.dase, etc.) into a 
S expressing J the protein proved by the invention into the cell, and estimating the activity of a marker gene product 

mnT II the protein or gene of the invention is associated with diseases, it is possible to screen a gene or compound 
Z can regulate its expression and/or activity either directly or indirectly by utilizing the protein of the pres en invent or, 
01 18] For example, the protein of the invention is expressed and purified as a recombinant protein. Then, the protein 
or (Z that in eracts with the protein of the invention is purified, and screened based on the binding. Alternative* the 
™XoSJbJSorn»d by adding with a compound of a candidate of the inhibitor added in advance and monitoring 
mTchang of bSSJ'n another method, a transcription regulatory region locating in the S'-upstrean , of the 
oeneSing the protein oHhe invention that is capable of regulating the expression of other genes is obta.ned Land 
fused wSh a marker gene. The fusion is introduced into a cell, and the cel. is added with compounds to explore a 

^£S£ZEXS£ E Scan be used for developing pharmaceutical and diagnostic medi- 
Sl tor M disease with which the protein of the present invention is associated. Similarly, if the regulatory factor 
obtaLd inthe screening is turn out to be a protein, compounds that can newly affect the expression or activity of the 
prS .may be used as a medicine lor the diseases with wh.ch the protein of the invention ,s associated. 
5 Ifthe protein of the invention has an enzymatic activity, regardless as to whether ,t is a secretory ^prote^ 
membrane protein or proteins associated with signal transduction, glycoprotein, transcription, or diseases, a screening 
Z^SSSSim a compound to the protein of the invention and monitoring the change of the compound. 
™e enzymatic activity may also be utilized to screen a compound that can inhibit the activity of the protein 
[012 T a sc keening given as an example, the protein of the invents is expressed and the 
purified Then compounds are contacted with the purified protein, and the amount of the compound and the reac on 
oroducts is examined Alternatively, compounds that are candidates of an inhibitor are pretreated, then a compound 
b stra that can reac wit the urified'protein is added, and the amount of the substrate and the reaot.on products 



rai e 22] m ' The compounds obtained in the screening may be used as a medicine for diseases with which the protein of 
Z invent .Seated. Aisothey can be appliedfortests that examinewhethertheprotem of the invention funct,ons 

«*"*■ 3 sP-^cantibody against the protein of 
or not is aeterminea dnui e between the expression or activity of the protein and a certain disease. 

KL™ Soh^Se the., expression aeo »=.»«,. Also, toe proline are oselol ,» u,e 0,00,0,0=. rnaosrr, as a 
^C^tSTS'^S pa«oo,s as . is or a „er „ » a 

, ^°roip,„ r 

vehicle, specfically sterilized water, can be admin i S tered to patients by a method known to those skilled 
appropriately. The pharmaceu .cal ^^SSJm Ejections. The dosage may vary depending on the weight 
in the art, such as intraarterial, intravenous ui 
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or age of a patient, or the method of administration, but those skilled in the art can choose an appropriate dosage 
properly. If the compound is encoded by DNA, the DNA can be cloned into a vector for gene therapy, and used for 
gene therapy. The dosage of the DNA and the method of its administration may vary depend.ng on the weight or age 
of a patient, or the symptoms, but those skilled in the art can choose properly. 
5 [0126] The present invention further relates to databases comprising at least a sequence of polynucleotide and/or 
protein, or a med.um recorded in such databases, selected from the sequence data of the nucleotide and/or the ammo 
acids indicated in Table 350 and Table 351 . 

The term "database" means a set of accumulated information as machine-searchable and readable information of 
nucleotide sequence. The databases of the present invention comprise at least one of the novel nucleotide sequences 

io of polynucleotides provided by the present invention. The databases of the present invention can consist of only the 
sequence data of the novel polynucleotides provided by the present invention or can comprise other information on 
nucleotide sequences of known full-length cDNAs or ESTs. The databases of the present invention can be comprised 
of not only the information on the nucleotide sequences but also the information on the gene functions revealed by the 
present invention. Additional information such as names of DNA clones carrying the full-length cDNAs can be recorded 

is or linked together with the sequence data in the databases. 

[0127] The database of the present invention is useful for gaining complete gene sequence information from partial 
sequence information of a gene of interest. The database of the present invention comprises nucleotide sequence 
information of full-length cDNAs. Consequently, by comparing the information in this database with the nucleotide 
sequence of a partial gene fragment yielded by differential display method or subtraction method, the information on 

20 the full-length nucleotide sequence of interest can be gained from the sequence of the partial fragment as a starting clue. 
[01 28] The sequence information of the full-length cDN As constituting the database of the present invention contains 
not only the information on the complete sequences but also extra information on expression frequency of the genes 
as well as homology of the genes to known genes and known proteins. Thus the extra information facilitates rapid 
functional analyses of partial gene fragments. Further, the information on human genes is accumulated in the database 

25 of the present invention, and therefore, the database is useful for isolating a human homologue of a gene originating 
from other species. The human homologue can be isolated based on the nucleotide sequence of the gene from the 
original species. 

[0129] At present, information on a wide variety of gene fragments can be obtained by differential display method 
and subtraction method. In general, these gene fragments are utilized as tools for isolating the full-length sequences 

30 thereof. When the gene fragment corresponds to an already-known gene, the full-length sequence is easily obtained 
by comparing the partial sequence with the information in known databases. However, when there exists no information 
corresponding to the partial sequence of interest in the known databases, cDNA cloning should be carried out for the 
full-length cDNA. It is often difficult to obtain the full-length nucleotide sequence using the partial sequence information 
as an initial clue. If the full-length of the gene is not available, the amino acid sequence of the protein encoded by the 

35 gene remains unidentified. Thus the database of the present invention can contribute to the identification of full-length 
cDNAs corresponding to gene fragments, which cannot be revealed by using databases of known genes. 
[0130] The present invention has provided 5602 novel full-length cDNA clones, and primers for synthesizing the 
cDNA. As has not yet proceeded the isolation of full-length cDNA within the human, the invention has great significance. 
The full-length cDNA clones contain the translation initiation site, and thus provide a useful information for analysis of 

40 protein functions. 

[0131] The cDNA clones are assumed to encode proteins such as secretory proteins, membrane proteins, signal 
transduction-associated protein, glycoprotein-associated protein, or transcription-associated protein, etc., which have 
important functions in vivo, and also predicted to be associated with many diseases. The genes and proteins associated 
with diseases are useful for developing a diagnostic marker or medicines for regulation of their expression and activity, 
45 or as a target of gene therapy. 

[01 32] The invention is illustrated more specifically with reference to the following examples, but is not to be construed 
as being limited thereto. 



50 



EXAMPLE 1 



Construction of a cDNA library by the oligo-capping method. 



[0133] The NT-2 neuron progenitor cells (Stratagene). a leratocarcinoma cell line from human embryo testis, which 
can differentiate into neurons by the treatment with retinoic acid were used. 
55 The NT-2 cells were cultured according to the manufacturer's instructions as follows. 

(1) NT-2 celis were cultured without induction by retinoic acid treatment (NT2RM1 , NT2RM2, NT2RM4). 

(2) After cultured, NT-2 cells were induced by adding retinoic acid, and then were cultured for 48 hours (NT2RP1 ). 
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NO: 16216, SEQ ID NO: 16163 / SEQ ID NO: 16217, and SEQ ID NO: 16164 / SEQ ID NO: 
16218 

4. A polynucleotide which can be synthesized with the primer set of claim 2 or 3. 

5. A polynucleotide comprising a coding region in the polynucleotide of claim 4. 

6. A substantially pure protein encoded by polynucleotide of claim 4. 

7. A partial peptide of the protein of claim 6. 

8. An isolated polynucleotide selected from the group consisting of 

15 (a) a polynucleotide comprising a coding region of the nucleotide sequence set forth in any one of the following 

SEQ ID NOs: 



20 



25 



30 



35 



40 



45 



50 



55 



2494 



EP 1 074 617 A2 



SEQ ID NO: 10468, SEQ ID NO: 10470, SEQ ID NO: 10471, SEQ ID NO: 10472, SEQ ID 
NO: 10473, SEQ ID NO: 10475, SEQ ID NO: 10477, SEQ ID NO: 10479 SEQ ID NO: 1048x, 
SEQ ID NO: 10483. SEQ ID NO: 10485, SEQ ID NO: 10487, ^^OilOm. SEQ ID 

5 NO 10489, SEQ ID NO: 10491, SEQ ID NO: 10493. SEQ ID NO: 10495. SEQ ID NO: 10496, 
SEO ID NO: 10497, SEQ ID NO: 10498, SEQ ID NO: 10500, SEQ ID NO: 10502. SEQ ID 
NO 10503 SEQ ID NO: 10504. SEQ m NO: 10505. SEQ ID NO: 10507. SEQ ID NO: 10508. 
SEO ID NO- 10510, SEQ ID NO: 1 9511, SEQ ID NO: 10512. SEQ ID NO: 10514. SEQ ED 
NO 10516 SEQ ID NO: 10517. SEQ ID NO: 10519. SEQ ID NO: 10521. SEQ ID NO: 10523. 

10 SEO ID NO- 10524, SEQ ID NO: 10526, SEQ ID NO: 10528, SEQ ID NO: 10529, SEQ ID 

NO 10530 SEQ ID NO: 10532. SEQ ID NO: 10534, SEQ ID NO: 10535. SEQ ID NO: 10537,. 
SEQ ID NO 10539. SEQ ID NO: 10540. SEQ ID NO: 10542. SEQ ID NO: 10543. SEQ ID 
NO 10545 SEQ ID NO: 10546. SEQ ID NO: 10548. SEQ ID NO: 10550. SEQ ID NO: 10551, 

" SEQ ID NO: 10553. SEQ ID NO: 10555. SEQ ID NO: 10556, SEQ ID NO: 10557. SEQ ID . 
NO- 10558 S EQ ED NO: 10560. SEQ ID NO: 10562. SEQ ED NO: 10564. SEQ ID NO: 10566. 
SEO ID NO' 10567, SEQ ED NO: 10569. SEQ ED NO: 10571. SEQ ID NO: 10573. SEQ ID 
NO 10574 SEQ ID NO 10576. SEQ ED NO: 10578, SEQ ID NO: 10580. SEQ ID NO: 1058.. 
SEOIDNO- 10584 SEQ ED NO: 10586. SEQ ID NO: 10588, SEQ ID NO: 10590, SEQ ID 

20 NO 10592 SEQ © NO: 10594, SEQ ID NO: 10596, SEQ ID NO: 10597. SEQ ED NO: 10599. 
SEQ ID NO 10601, SEQ ID NO: 10603, SEQ ED NO: 10604, SEQ ID NO: 10606, SEQ ID 
NO 10607 SEQ ED NO: 10609, SEQ ID NO: 10611, SEQ ID NO: 10613, SEQ ID NO: 10614, 
SEQ ID NO 106 15, SEQ ID n6: 10616, SEQ ID NO: 10618, SEQ ID NO: 10619, SEQ ID 

* NO: 10620, SEQ ID NO: 10622, SEQ ID NO: 10624, SEQ ID NO: 10625, SEQ ID NO: 10627, 

SEQ £ NO: \ ml', SEQ ID NO: 10632, SEQ ID NO: 10633, SEQ ID NO: 10635^ SEQ ID 
NO 10637 SEQ ED NO: 10639. SEQ ID NO: 10641, SEQ ID NO: 10642, SEQ ID NO: 10644. 
«, SEO ID NO 10646. SEQ ID NO: 10647. SEQ ID NO: 10648, SEQ ID NO: 10649, SEQ ID 

NO ?1065 ) SEQ] S NO , 10652. SEQ ID NO: 10654, SEQ ID NO: 10655. SEQ ID NO: 10656. 
SEQ m NO: 10658, SEQ ID NO: 10659. SEQ ID NO: 10661, SEQ ID NO: 10663. SEQ ID 
NO- 10665 SEQ IE) NO: 10667, SEQ ID NO: 10669, SEQ ID NO: 10670, SEQ ID NO: 10671, 
SEO ID NO - 1 0673 SEQ ID NO: 10674. SEQ ID NO: 10676, SEQ ID NO: 10678. SEQ ID 
- SSK-«SS2l06l2.SEQIDNO: 10683. SEQ ID NO: 106* ^^^10687. 
SEO ID NO- 10689, SEQ ID NO: 10691, SEQ ID NO: 10693, SEQ ID NO: 10695, SEQ ID 
NO: 1^96 SEQ E) NO: 10698, SEQ ID NO: 10700. SEQ ID NO: 10702. SEQ ID NO: 10704. 
SEQDDN6: 10706. SEQ ID NO: 10708. SEQ ID NO: 10710. SEQ ID NO: 10711. SEQ ID 
NO 10713 SEQ ID NO: 10715, SEQ ID NO: 10717. SEQ ID NO: 10718. SEQ ID NO: 10720. 
SEQ ID NO 10722. SEQ ID NO: 10723. SEQ ID NO: 10725. SEQ ID NO: 10727. SEQ ID 
NO 10728 SEQ ID NO 10730, SEQ ID NO: 10732, SEQ ID NO: 10734, SEQ ID NO: 10736, 
SEQ ID NO- 10738, SEQ ID NO: 10740. SEQ ID NO: 10742, SEQ ID NO: 10744, SEQ ID 
NO 1^46 SEQ ID NO: 10748, SEQ ED NO: 10750, SEQ ID NO: 10752. SEQ ID NO: 10753. 
SEQ ID NO 10754 SEQ ID NO: 10756. SEQ ID NO: 10757. SEQ ID NO: 10758. SEQ ID 
NO 10760 SEQ ID NO 10761, SEQ ID NO: 10763. SEQ ID NO: 10765, SEQ ID NO: 10767. 
SEQ ID NO- 10769 SEQ ID NO: 10771. SEQ ID NO: 10773. SEQ ID NO: 10774. SEQ ID 
NO: 10776, SEQ ID NO: 10778, SEQ ID NO: 10780, SEQ ID NO: 10781, SEQ ID NO: 10783, 
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NO: 12019, SEQ ID NO: 12020, SEQ ID NO: 12022, SEQ ID NO: 12024. SEQ ID NO: 12026, 
SEQ ID NO: 12028, SEQ ID NO: 12030, SEQ ID NO: 12032. SEQ ID NO: 12034, SEQ ID 
NO: 12036, SEQ ID NO: 12038, SEQ ID NO: 12040. SEQ ID NO: 12042. SEQ ID NO: 12044, 
5 SEQ ID NO: 12046, SEQ ID NO: 12048, SEQ ID NO: 12049, SEQ ID NO: 12051, SEQ ID 

NO: 12053, SEQ ID NO: 12055, SEQ ID NO: 12056, SEQ ID NO: 12058. SEQ ID NO: 12060, 
SEQ ID NO: 12062, SEQ ED NO: 12064, SEQ ID NO: 12066. SEQ ID NO: 12068. SEQ ID 
NO: 12070, SEQ ID NO: 12072, SEQ ID NO: 12074, SEQ ID NO: 12076, SEQ ID NO: 12078, 
SEQ ID NO: 12080, 

» SEQ ID NO: 12082, SEQ ID NO: 12084, SEQ ID NO: 12086. SEQ ID NO: 12088. SEQ ID 

NO- 12090 SEQ ID NO: 12092, SEQ ID NO: 12094, SEQ ID NO: 12096, SEQ ID NO: 12097, 
SEO ID NO: 12099, SEQ ID NO: 12101, SEQ ID NO: 12103, SEQ ID NO: 12105. SEQ ED 
NCM2107 SEQ ID NO: 12109. SEQ ID NO: 12111. SEQ ID NO: 12113, SEQ ID NO: 12115, 
SEO ID NO: 121 17, SEQ ID NO: 12119, SEQ ID NO: 12121. SEQ ID NO: 12123, SEQ ID 
15 NO 12125 SEO ID NO: 12127, SEQ ID NO: 12129. SEQ ID NO: 12131. SEQ ID NO: 12133. 

SEO ID NO- 12135, SEQ ID NO: 12137, SEQ ID NO: 12139, SEQ ID NO: 12141, SEQ ID 
NO- 12143 SEQ ID NO: 12145, SE.Q ID NO: 12147, SEQ ID NO: 12149, SEQ ID NO: 12151, 
SEO ID NO- 12153, SEQ ID NO: 12155, SEQ ID NO: 12157, SEQ ID NO: 12159. SEQ ID 
NO" 12161 SEQ ID NO: 12163, SEQ ID NO: 12165, SEQ ID NO: 12167, SEQ ID NO: 12169, 
20 SEO ID NO- 12171, SEQ ID NO: 12173, SEQ ID NO: 12175, SEQ ID NO: 12177, SEQ ED _ 

NO 12179 SEQ ID NO: 12181, SEQ ID NO: 12183. SEQ ED NO: 12185, SEQ ID NO: 12187, 
SEQ ID NO- 12189. SEQ ED NO: 12191. SEQ ID NO: 12193. SEQ ED NO: 12195, SEQ ED 
NO 12197 SEQ ID NO: 12199, SEQ ID NO: 12201; SEQ ED NO: 12203, SEQ ID NO: 12205,. 
SEO ID NO- 12207, SEQ ID NO: 12209, SEQ ID NO: 12211, SEQ ID NO: 12213, SEQ ED . 
NO- 12215 SEQ ID NO: 12216. SEQ ID NO: 12218. SEQ ID NO: 12220. SEQ ID NO: 12222. 
SEO ID NO- 12224, SEQ ID NO: 12226, SEQ ID NO: 12228, SEQ ID NO: 12230, SEQ ED 
Na 12232 SEQ ID NO: 12234, SEQ ID NO: 12236, SEQ ID NO: 12238, SEQ ID NO: 12240, 
SEO ID NO- 12242, SEQ ID NO: 12244, SEQ ID NO: 12246. SEQ ID NO: 12248. SEQ ID 
NO- 12250 SEQ ID NO: 12252, SEQ ID NO: 12254, SEQ ID NO: 12256, SEQ ID NO: 12258, 
SEQ ID NO: 12260, SEQ ID NO: 12262. SEQ ID NO: 12264. SEQ ID NO: 1226£ SEQ ID 
NO: 12268. SEQ ID NO: 12270. SEQ ID NO: 12272, SEQ ID NO: 12274. SEQ ID NO: 12276, 

SEO ID NO : 12279. SEQ ID NO: 12281, SEQ ID NO: 12283, SEQ ID NO: 12285, SEQ ID 
35 NO 12287 SEQ ID NO: 12289, SEQ ID NO: 12291, SEQ ID NO: 12293, SEQ ID NO: 12295. 

SEQ ID NO- 12297, SEQ ID NO: 12299. SEQ ID NO: 12301. SEQ ID NO: 12303, SEQ ID 
NO- 12305 SEQ ID NO: 12307, SEQ ID NO: 12309, SEQ ID NO: 1231 1, SEQ ID NO: 12313, 
SEQ ID NO: 12315, SEQ ID NO: 12317, SEQ ID NO: 12319, SEQ ID NO: 12321, SEQ W 
NO- 12323 SEQ ID NO: 12325, SEQ ID NO: 12327, SEQ ID NO: 12329, SEQ ID NO: 12331, 
40 SEQ ID NO: 12333. SEQ ID NO: 12335. SEQ ID NO: 12337, SEQ ID NO: 12339, SEQ ID " 

NO- 12341 SEQ ED NO: 12343. SEQ ID NO: 12345, SEQ ID NO: 12347, SEQ ID NO: 12349, 
SEQ ID NO: 12351, SEQ ID NO: 12353, SEQ ID NO: 12354, SEQ ID NO: 12356, SEQ ID 
NO- 12358 SEQ ID NO: 12360, SEQ ID NO: 12362, SEQ ID NO: 12364, SEQ ID NO: 12366, 
SEQ ID NO: 12368, SEQ ID NO: 12370. SEQ ID NO: 12372. SEQ ID NO: 12374, SEQ ID > 
NO 12376 SEQ ED NO: 12378, SEQ ID NO: 12380, SEQ ID NO: 12382, SEQ ID NO: 12384. 
SEQ ID NO: 12386. SEQ ID NO: 12388. SEQ ID NO: 12390. SEQ ID NO: 12392, SEQ ID 
NO 12394 SEQ ID NO 12396. SEQ ID NO: 12398, SEQ ID NO: 12400, SEQ ID NO: 12402, 
SEO ED NO- 12403 SEQ ED NO: 12405, SEQ ED NO: 12407, SEQ ID NO: 12409, SEQ ED 
NO 12410 SEO ED NO- 12412, SEQ ID NO: 12414. SEQ ID NO: 12416, SEQ ED NO: 12417, 
SEO ED NO- 12419 SEQ ID NO: 12421. SEQ ID NO: 12423, SEQ ID NO: 12425, SEQ ID 
NO 12427 SEO ED NO: 12429, SEQ ID NO: 12431, SEQ ID NO: 12433, SEQ ID NO: 12435, 
SEO ED NO- 12436 SEQ ED NO: 12438. SEQ ID NO: 12440, SEQ ED NO: 12442, SEQ DD 
NO 12444 SEO ID NO 12446. SEQ ID NO: 12448. SEQ ED NO: 12450. SEQ ID NO: 12452. 
SEQ ED NO- 12454 SEQ ID NO: 12456, SEQ ID NO: 12458. SEQ ID NO: 12460. SEQ DD 
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SEQ ID NO: 18938, SEQ ID NO: 18940, SEQ ID NO: 18941, SEQ ID N0:-1 8943, SEQ ID 
NO- 18944 SEQ ID NO: 18946, SEQ ID NO: 18947, SEQ ID NO: 18949, SEQ ID NO: 18951. 
SEQ ID NO: 18953, SEQ ID NO: 1 8955, SEQ ED NO: 18956, SEQ ID NO: 18957, SEQ ID 
5 NO- 18958 SEQIDNO: 18959, SEQ ED NO: 18960, SEQIDNO: 18962, SEQIDNO: 18964, 
SEQ ID NO: 18966, SEQ ED NO: 18968, SEQ ID NO: 18969, SEQ ID NO: 18970, SEQ ID 
NO: 18972, SEQ ID NO: 18973, SEQ ED NO: 18975, SEQ ID NO: 18976, SEQ ID NO: 18978, 
SEQIDNO: 18980, 

,« SEO ID NO: 18981, SEQ ID NO: 1 8982, SEQ ID NO: 18983, SEQ ID NO: 1 8984, SEQ ID 
* NO 18985 SEQIDNO: 18986, SEQ ID NO: 18987, SEQ ID NO: 18988, SEQ ID NO: 18989, 

SEO ID NO- 18990 SEQ ID NO: 18992, SEQ ID NO: 18993, SEQ ID NO: 18995, SEQ ID 

NO 18997 SEQ ID NO: 18998, SEQ ID NO: 18999, SEQ ID NO: 19000, SEQ ID NO: 19001, 

SEO ID NO: 19002, SEQ ID NO: 19004, SEQ ID NO: 19006 
15 SEO ID NO- 19007 SEQ ID NO: 19009, SEQ ED NO: 19011, SEQ ID NO: 19012, SEQ ID 

NO 19013 SEQ ID NO: 19014, SEQ ID NO: 19016, SEQ ED NO: 19018, SEQ ID NO: 19020, 

SEO ID NO: 19022, SEQ ID NO: 19024, and SEQ ID NO: 19025 

20 (b) a polynucleotide comprising a nucleotide sequence encoding a protein comprising the amino acid sequence 

set forth in any one of the following SEQ ID NOs: 
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SEQ ID NO:10469, SEQ ID NO:10474, SEQ ID NO:10476. SEQ ED NO:10478, SEQ ID 
NO 10480, SEQ ID NO:10482, SEQ ID NO:10484, SEQ ID NO:10486, SEQ ID NO:10490, 
SEQ ID NO:10492, SEQ ID NO:10494, SEQ ID NO:10499. SEQ ID NO:10501, SEQ ID 
NO 10506, SEQ ID NO:10509, SEQ ID NO:10513, SEQ ID NO:10515, SEQ ID NO:10518, 
SEQ ID NO:10520, SEQ ID NO:10522, SEQ ID NO:10525, SEQ ID NO:10527, SEQ ID 
NO 10531 SEQ ID NO:10533, SEQ ID NO:10536, SEQ ID NO:10538, SEQ ID NO:10541, . 
SEQ ID NO:10544, SEQ ID NO:10547, SEQ ID NO.10549, SEQ ID NO:10552, SEQ ID 
NO 10554 SEQ ID NO:10559. SEQ ID NO:10561, SEQ ID NO:10563, SEQ ID NO:10565, 
SEQ ID NO:10568, SEQ ID NO:10570, SEQ ID NO:10572, SEQ ID NO:10575, SEQ ID 
NO 10577 SEQ ID NO:10579, SEQ ID NO:10581, SEQ ID NO:10583. SEQ ID NO:10585, 
SEQ ID NO:10587, SEQ ID NO:10589, SEQ ID NO:10591, SEQ ID NO:10593, SEQ ID 
NO 10595 SEQ ID NO:10598, SEQ ID NO:10600, SEQ ED NO:10602, SEQ ID NO:10605, 
SEQ ID NO:10608, SEQ ID NO:10610. SEQ ED NO:10612, SEQ ID NO:10617, SEQ ED 
NO 10621 SEQ ID NO:10623, SEQ ID NO:10626, SEQ ID NO:10628, SEQ ED NO:10631, 
SEQ ED NO 10634, SEQ ED NO:10636, SEQ ED NO:10638, SEQ ED NO:10640, SEQ ED 
NO-10643 SEQ ED NO.10645, SEQ ID NO:10651. SEQ ED NO:10653, SEQ ED NO:10657, 
SEQ ED NO:10660, SEQ ED NO:10662, SEQ ID NO:10664, SEQ ED NO:10666, SEQ ED 
NO 10668 SEQ ID NO:10672, SEQ EO NO:10675, SEQ ED NO:10677, SEQ ID NO:10679, 
SEQ ED NO:10681 SEQ ED NO:10684, SEQ ED NO:10686, SEQ ED NO:10688, SEQ ED 
NO-10690 SEQ EO NO:10692, SEQ ID NO:10694, SEQ ED NO:10697, SEQ ED NO:10699, 
SEQ ED NO-10701 SEQ ED NO:10703, SEQ ED NO.10705, SEQ ED NO:10707, SEQ ED 
NO:10709, SEQ ED NO.10712, SEQ ID NO:10714, SEQ ED NO:10716, SEQ ED NO:10719, 
SEQEDNO:10721, _ 
SEQ ED NO 10724 SEQ ED NO:10726, SEQ ID NO:10729, SEQ EO NO:10731, SEQ ED 
NO-10733 SEQ EO NO:10735, SEQ EO NO:10737, SEQ ED NO:10739, SEQ ED NO:10741, 
SEQ ED NO:10743, SEQ ID NO:10745, SEQ ED NO:10747, SEQ EO NO:10749, SEQ ED 
NO-10751 SEQ EO NO:10755, SEQ EO NO:10759. SEQ ED NO:10762, SEQ ED NO:10764, 
SEQ ED NO:10766, SEQ EO NO:10768, SEQ ID NO:10770, SEQ EO NO:10772, SEQ ED 
NO-10775 SEQ ED NO:10777. SEQ EO NO:10779, SEQ ED NO:10782, SEQ ED NO:10784, 
SEQ ID NO:10787, SEQ EO NO:10789, SEQ EO NO.10791, SEQ ID NO:10794, SEQ ED 
NO-10796 SEQ EO NO:10798, SEQ ED NO-.10801, SEQ ED NO:10803, SEQ ED NO:10806, 
SEQ EO NO 10809, SEQ EO NO:10811, SEQ ED NO:10813, SEQ ED NO:10816, SEQ ED 
NO-10819 SEQ ED NO:10821, SEQ EO NO:10823, SEQ ED NO:10825, SEQ ED NO:10827, 
SEQ EO NO-.10829, SEQ ED NO:10833, SEQ ED NO:10835, SEQ ED NO.10837, SEQ ED 
NO-10839 SEQ ED NO:10843, SEQ EO NO:10846, SEQ ED NO:10848, SEQ ID NO:10851, 
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SEQ ID NO:ll 836, SEQ ID NO:11838, SEQ ID NO:11840, SEQ ID NO:11842, SEQ ID 
N0.11844. SEQ ID N0:1 1846, SEQ ID NO:11848. SEQ ID NO:11852, SEQ ID NO:11854, 
SEQ ID NO:11857,SEQ ID NO:11860, SEQ ID NO:11862, SEQ ID NO:11864, SEQ ID 
N0.11866, SEQ ID NO:11868, SEQ ID NO:11872, SEQ ID NO:11874, SEQ ID NO:11876, 
SEQ ID N0:11878, SEQ ID NO:11881, SEQ ID NO: 11 883. SEQ ID NO:11885, SEQ ID 
N0 11887 SEQ ID N0:11889. SEQ ID NO:11891. SEQ ID NO:11894, SEQ ID NO:11896, 
SEQ ID N0:11 898, SEQ ID N0:1 1900. SEQ ID N0:1 1902, SEQ ID N0:1 1904, SEQ ID 
NO 11906, SEQ ID NO:11908, SEQ ID NO:11910, SEQ ID N0:11912, SEQ ID NO:11914, 
SEQ ID NO:11917. SEQ ID N0:1 1920. SEQ ID NO.l 1922. SEQ ID NO:U924, SEQ ED 
N0 11926.SEQIDNO:ll929.SEQn)NO:11931,SEQIDNO:11933,SEQIDNO:11935. 

SEQ ID N0:11937, SEQ ID NO:11939, SEQ ID NO:11941, SEQ ££0:11943. SEQ © 
NO 11945. SEQ ID NO:11947, SEQ ID NO:11950, SEQ ID NO:11952, SEQ ID NO:11954. 
SEOIDNO-11956,SEQIDNO:11958. SEQ ID NO: 11960, SEQ ID NO: 11 962, SEQ ID 
NO 11964 SEQ ID NO:11966. SEQ ID NO:11968. SEQ ID NO:11970. SEQ ID NO:11972. 
SEO ID NO:11974, SEQ ID NO:11976. SEQ ID NO: 11979, SEQ ID N0:1 1981, SEQ ID 
NO 11983 SEQ ID NO:11985. SEQ ED N0:11987. SEQ ID N0:11989. SEQ ID NO:11991, 
SEO ID N0 11993. SEQ ID N0:1 1995, SEQ ID NO:11997. SEQ ID NO:11999. SEQ ID 
NO 12001 SEQ ID NO:12004. SEQ ID NO:12006. SEQ ID NO:12009, SEQ ID NO:12011. 
SEO ID NO-12013. SEQ ID NO:12015. SEQ ID NO:12017, SEQ ID NO:12021. SEQ ID 
NO 12023 SEQ ID NO:12025. SEQ ID NO:12027. SEQ ID NO:12029. SEQ ED NO:12031. 
SEQ ID NO:12033, SEQ ID NO:12035. SEQ ED NO:12037. SEQ ID NO:12039. SEQ ED 
NO:12041. SEQ ID NO:12043. SEQ ID NO:12045. SEQ ED NO:12047. SEQ ED NO:12050, 

SEO S No!l2054! SEQ ID NO:12057, SEQ ID NO:12059, SEQ ID NO:12061, SEQ ED 
NO 12063 SEQ ID NO.12065, SEQ ED NO:12067, SEQ ED NO.12069. SEQ ED NO:12071, 
SEO ED NO 12073 SEQ ED NO:12075. SEQ ID NO:12077. SEQ ID NO:12079, SEQ ED 
NO 12081 SEQ ED NO:12083. SEQ ED NO:12085. SEQ ED NO:12087. SEQ ED NO:12089, 
SEO ED NO 12091 SEQ ED NO:12093, SEQ ID NO:12095. SEQ ID NO:12098, SEQ ID 
NCW2100 SEQ ED NO:12102. SEQ ED NO:12104, SEQ ED NO:12106, SEQ ED NO:12108, 
SEQ ED N6:12110, SEQ ED NO:12112, SEQ ED NO:12114, SEQ ID N0:121l£ SEQ ID 
NCH2118 SEQ ED NO:12120. SEQ ED NO:12122, SEQ ED NO:12124. SEQ ED N0:12126. 
SEO ED N012128 SEQ ID NO:12130, SEQ ED NO:12132. SEQ ID N0:12134. SEQ ED 
NO 12136 SEQ ED NO:12138. SEQ ID NO:12140. SEQ ED N0:12142. SEQ ID N0:12144. 
SEQ ID NO:12H6, SEQ ED NO:12148. SEQ ID NO:12150, SEQ ID NO:1215£ SEQ © 
NO 12154 SEQ ID NO:12156, SEQ ID NO:12158. SEQ ED NO:12160. SEQ ID NO:12162, 
SEO ED N012164 SEQ ID NO:12166. SEQ ED NO:12168. SEQ ID NO:12170, SEQ ID 
NO m72 SEQ K)' NO:12174, SEQ ID NO:12176. SEQ ID NO:12178, SEQ ID NO:12180. 
SEO ED N012182, SEQ ID N0:121 84. SEQ ED NO:12186, SEQ ED NO:12188, SEQ ID 
NO 12190 SEQ ID N0:12192. SEQ ID NO:12194. SEQ ID NO:12196. SEQ ED NO:12198. 
SEQ ID NO:12200, SEQ ID NO:12202. SEQ ED NO:12204. SEQ ID NO:12206. SEQ ID 
NO 12208, SEQ ID NO:12210. SEQ ID NO:12212, SEQ ID NO:12214, SEQ ID NO:12217. 
SEQ ID NO:12219. SEQ ID NO:12221, SEQ ID NO.-12223, SEQ ED NO:1222£ SEQ ID 
NO 12227. SEQ ID NO:12229. SEQ ED NO:12231 . SEQ ID NO:12233. SEQ ED NO:12235. 
SEQ ID NO:12237, SEQ ID N0:12239, SEQ ID NO:12241, SEQ ID NO:12243, SEQ TD 
N0:12245, SEQ ID NO:12247, SEQ ID NO:12249, SEQ ED NO:12251, SEQ ID NO:12253, 

SEO ID NO-12257 SEQ ID N0:12259. SEQ ID NO: 12261. SEQ ID NO:12263. SEQ ID 
NO 12265 SEQ ID NO:12267. SEQ ED NO:12269. SEQ CD NO:12271, SEQ ED NO:12273. 
SEO ED N0 12275 SEQ ED NO:12277, SEQ ID NO:12280, SEQ ID NO:12282. SEQ ID 
NO 12284 SEO ID NO:12286, SEQ ID NO:12288, SEQ ED NO:12290, SEQ ED NO:12292. 
SEO ID NO-12294 SEQ ID NO:12296. SEQ ID NO:12298, SEQ ID NO:12300. SEQ ID 
NO 12302 SEQ ID NO:12304, SEQ ED NO:12306, SEQ ID NO:12308, SEQ ID NO:12310, 
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N0.18662, SEQ ID NO:l 8665, SEQ ID NO:18668, SEQ ID NO:18672, SEQ ID NO:18674, 
SEQ ID NO:18676, SEQ ID NO:18678, SEQ ID NO:18680, SEQ ID NO:18682, SEQ ID 
NO:18686, SEQ ID N0:18688, SEQ ID NO:18691, SEQ ID NO:18693, SEQ ID NO:18695, 
SEQ ID NO:18697, SEQ ID NO:18699, SEQ ID NO:18701, SEQ ID NO:18703 r SEQ ID 
NO:18705, SEQ ID NO:18709, SEQ ID NO:18711, SEQ ID N0:18713, SEQ ID N0:18715, 
SEQ ID NO:18717, SEQ ID NO:18720 T SEQ ID NO:18722, SEQ ID NO:18724, SEQ ID 
NO:I8726, SEQ ID N0:18729, SEQ ID NO:18731, SEQ ID NO:18733, SEQ ID NO:18735, 
SEQ ID NO:18737, SEQ ID NO:18739, SEQ ID NO:l 8741, SEQ ID NO:18743, SEQ ID 
N0 18745 SEQ ID NO:18747, SEQ ID N0.18749, SEQ ID NO:18751, SEQ ID N0:18753, 
SEQ ID NO:18759, SEQ ID NO:18763, SEQ ID NO:l 8765, SEQ ID NO:18770, SEQ ID 
N0 18773, SEQ ID NO:18775, SEQ ID NO:18777, SEQ ID NO:18779. SEQ ID NO:18781, 
SEQ ID NO :1 8783, SEQ ED N0:18785, SEQ ED NO:18787, SEQ ED NO:18790, SEQ ID 
N0 18793 SEQ ID NO:18795, SEQ ID NO:18797, SEQ ED NO:18800, SEQ ED NO:18802, 
SEQ ID NO:18804, SEQ ED NO:18806, SEQ ID NO:18809, SEQ ID NO:1881 1, SEQ ED 
NO-18813 SEQ ID NO:l8815, SEQ ID NO:18817, SEQ ID NO:18819, SEQ ID NO:18822, 
SEQ ED NO:18824, SEQ ED NO:18826, SEQ ID NO:18828, SEQ ED NO:18830, SEQ ID 
NO-18832, SEQ ID N0:18834, SEQ ED N0:18836, SEQ ID NO:18840, SEQ ID NO:18843, 
SEQ ID NO:18847, SEQ ED NO:18850, SEQ ID NO:18853, SEQ ID NO:18855, SEQ ID 
NO:18857, SEQ ID N0:18859, SEQ ID NO:18862, SEQ ED NO:18865, SEQ.ID NO:18868, 
SEQIDNO:18870, 

SEQ ED NO:18874, SEQ ED NO: 18876, SEQ ID NO:18879, SEQ ED NO:18882, SEQ ED 
NO-18884 SEQ ID NO:18888, SEQ ED NO:18891, SEQ ID NO:18894, SEQ ID NO:18896, 
SEQ ID NO:18898, SEQ ED NO:18900, SEQ ID NO:18902, SEQ ED NO:18906, SEQ ID - 
NO 18908 SEQ ID NO:18910, SEQ ID NO:18912, SEQ ID NO:18914; SEQ ID N0:18916, 
SEQ ED N0:18918, SEQ ID NO:18920, SEQ ID NO:18922, SEQ ID NO:18924, SEQ ID 
NO:18926, SEQ ED NO:18929, SEQ ID N0:18931, SEQ ID NO:18933, SEQ ID N0:18935, 
SEQ ED NO:18937, SEQ ED NO:18939, SEQ ID NO:18942, SEQ ID N0:18945, SEQ ED 
NO:18948, SEQ ED NO:18950, SEQ ED NO:18952, SEQ ID NO:18954, SEQ ID NO:18961, 
SEQ ED NO:18963, SEQ ID NO:18965, SEQ ED NO:18967, SEQ ED NO:18971, SEQ ID 
NO:18974, SEQ ID NO:18977, SEQ ED NO:18979, 

SEQ ED NO:18991, SEQ ID N0.18994, SEQ ED NO:18996, SEQ ED NO:19003, SEQ ID 
NO :1 9005 

SEQ ED NO:19008, SEQ ID NO:19010. SEQ ED NO:19015, SEQ ID NO:19017, SEQ ID 
NO:19019, SEQ ID NO:19021, and SEQ ID NO:19023 

(c) a polynucleotide comprising a nucleotide sequence encoding a protein comprising an amino acid sequence 
selected from the amino acid sequences of (b), in which one or more amino acids are substituted, deleted, 
inserted, and/or added, wherein said protein is functionally equivalent to the protein comprising said ammo 
acid sequence selected from the amino acid sequences of (b); 

(d) a polynucleotide that hybridizes with a polynucleotide comprising a nucleotide sequence selected from the 
nucleotide sequences of (a), and that comprises a nucleotide sequence encoding a protein functionally equiv- 
alent to the protein encoded by the nucleotide sequence selected from the nucleotide sequences of (a); 

(e) a polynucleotide comprising a nucleotide sequence encoding a partial amino acid sequence of a protein 
encoded by the polynucleotide of (a) to (d); 

(0 a polynucleotide comprising a nucleotide sequence with at least 70% identity to the nucleotide sequence 

of (a). 

9. A substantially pure protein encoded by the polynucleotide of claim 8. 

10. An antibody against the protein or peptide of any one of claims 6, 7. and 9. 
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